Paper Digest: ECCV 2018 Highlights
The European Conference on Computer Vision (ECCV) is one of the top computer vision conferences in the world. In 2018, it is to be held in Munich, Germany. There were 2,439 paper submissions, of which 776 were accepted (59 orals, 717 posters).
To help AI community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.
We thank all authors for writing these interesting papers, and readers for reading our digests. If you do not want to miss any interesting AI paper, you are welcome to sign up our free paper digest service to get new paper updates customized to your own interests on a daily basis.
Paper Digest Team
team@paperdigest.org
TABLE 1: ECCV 2018 Papers
Title | Authors | Highlight | |
---|---|---|---|
1 | Semi-convolutional Operators for Instance Segmentation | David Novotny, Samuel Albanie, Diane Larlus, Andrea Vedaldi | In this paper we show theoretically and empirically that constructing dense pixel embeddings that can separate object instances cannot be easily achieved using convolutional operators. |
2 | Learnable PINs: Cross-Modal Embeddings for Person Identity | Arsha Nagrani, Samuel Albanie, Andrew Zisserman | We propose and investigate an identity sensitive joint embedding of face and voice. |
3 | Learning-based Video Motion Magnification | Tae-Hyun Oh, Ronnachai Jaroensri, Changil Kim, Mohamed Elgharib, Fr’edo Durand, William T. Freeman, Wojciech Matusik | In this paper, we seek to learn the filters directly from examples using deep convolutional neural networks. |
4 | Video Object Segmentation with Joint Re-identification and Attention-Aware Mask Propagation | Xiaoxiao Li, Chen Change Loy | In this study, we formulate a deep recurrent network that is capable of segmenting and tracking objects in video simultaneously by their temporal continuity, yet able to re-identify them when they re-appear after a prolonged occlusion. |
5 | CBAM: Convolutional Block Attention Module | Sanghyun Woo , Jongchan Park , Joon-Young Lee, In So Kweon | We propose Convolutional Block Attention Module (CBAM), a simple and effective attention module that can be integrated with any feed-forward convolutional neural networks. |
6 | BodyNet: Volumetric Inference of 3D Human Body Shapes | Gul Varol, Duygu Ceylan, Bryan Russell, Jimei Yang, Ersin Yumer, Ivan Laptev, Cordelia Schmid | In this work we argue for an alternative representation and propose BodyNet, a neural network for direct inference of volumetric body shape from a single image. |
7 | CNN-PS: CNN-based Photometric Stereo for General Non-Convex Surfaces | Satoshi Ikehata | This paper presents a photometric stereo network that directly learns relationships between the photometric stereo input and surface normals of a scene. For training the network, we create a synthetic photometric stereo dataset that is generated by a physics-based renderer, therefore the global light transport is considered. |
8 | Spatio-temporal Transformer Network for Video Restoration | Tae Hyun Kim, Mehdi S. M. Sajjadi, Michael Hirsch, Bernhard Scholkopf | To alleviate these problems, we propose a novel Spatio-temporal Transformer Network (STTN) which handles multiple frames at once and thereby manages to mitigate the common nuisance of occlusions in optical flow estimation. |
9 | PS-FCN: A Flexible Learning Framework for Photometric Stereo | Guanying Chen, Kai Han, Kwan-Yee K. Wong | In this paper, we propose a deep fully convolutional network, called PS-FCN, that takes an arbitrary number of images of a static object captured under different light directions with a fixed camera as input, and predicts a normal map of the object in a fast feed-forward pass. |
10 | Dynamic Conditional Networks for Few-Shot Learning | Fang Zhao, Jian Zhao, Shuicheng Yan, Jiashi Feng | This paper proposes a novel Dynamic Conditional Convolutional Network (DCCN) to handle conditional few-shot learning, i.e, only a few training samples are available for each condition. |
11 | Deep Factorised Inverse-Sketching | Kaiyue Pang, Da Li, Jifei Song, Yi-Zhe Song, Tao Xiang, Timothy M. Hospedales | In this paper we study this sketching process and attempt to invert it. |
12 | Separating Reflection and Transmission Images in the Wild | Patrick Wieschollek, Orazio Gallo, Jinwei Gu, Jan Kautz | We present a deep learning approach to separate the reflected and the transmitted components of the recorded irradiance, which explicitly uses the polarization properties of light. |
13 | Ask, Acquire, and Attack: Data-free UAP Generation using Class Impressions | Konda Reddy Mopuri, Phani Krishna Uppala, R. Venkatesh Babu | In this paper, for data-free scenarios, we propose a novel approach that emulates the effect of data samples with class impressions in order to craft UAPs using data-driven objectives. |
14 | Rendering Portraitures from Monocular Camera and Beyond | Xiangyu Xu, Deqing Sun, Sifei Liu, Wenqi Ren, Yu-Jin Zhang, Ming-Hsuan Yang, Jian Sun | In this work, we introduce an automatic system that achieves portrait DoF rendering for monocular cameras. |
15 | Object Level Visual Reasoning in Videos | Fabien Baradel, Natalia Neverova, Christian Wolf, Julien Mille, Greg Mori | We propose a model capable of learning to reason about semantically meaningful spatio-temporal interactions in videos. |
16 | Dense Pose Transfer | Natalia Neverova, Riza Alp Guler, Iasonas Kokkinos | In this work we integrate ideas from surface-based modeling with neural synthesis: we propose a combination of surface-based pose estimation and deep generative models that allows us to perform accurate pose transfer, i.e. synthesize a new image of a person based on a single image of that person and the image of a pose donor. |
17 | Skeleton-Based Action Recognition with Spatial Reasoning and Temporal Stack Learning | Chenyang Si, Ya Jing, Wei Wang, Liang Wang, Tieniu Tan | In this paper, we propose a novel model with spatial reasoning and temporal stack learning (SR-TSL) for skeleton-based action recognition, which consists of a spatial reasoning network (SRN) and a temporal stack learning network (TSLN). |
18 | Learning to Segment via Cut-and-Paste | Tal Remez, Jonathan Huang, Matthew Brown | This paper presents a weakly-supervised approach to object instance segmentation. |
19 | Deep Boosting for Image Denoising | Chang Chen, Zhiwei Xiong, Xinmei Tian, Feng Wu | In this paper, we propose a novel deep boosting framework (DBF) for denoising, which integrates several convolutional networks in a feed-forward fashion. |
20 | Fictitious GAN: Training GANs with Historical Models | Hao Ge, Yin Xia, Xu Chen, Randall Berry, Ying Wu | Inspired by the fictitious play learning process, a novel training method, referred to as Fictitious GAN, is introduced. |
21 | Self-Supervised Relative Depth Learning for Urban Scene Understanding | Huaizu Jiang, Gustav Larsson, Michael Maire Greg Shakhnarovich, Erik Learned-Miller | In this work, we start by training a deep network, using fully automatic supervision, to predict relative scene depth from single images. |
22 | Look Deeper into Depth: Monocular Depth Estimation with Semantic Booster and Attention-Driven Loss | Jianbo Jiao, Ying Cao, Yibing Song, Rynson Lau | In this paper, we investigate the long tail property and delve deeper into the distant depth regions (i.e. the tail part) to propose an attention-driven loss for the network supervision. |
23 | Bi-box Regression for Pedestrian Detection and Occlusion Estimation | Chunluan Zhou, Junsong Yuan | In this paper, we propose a novel approach to simultaneous pedestrian detection and occlusion estimation by regressing two bounding boxes to localize the full body as well as the visible part of a pedestrian respectively. |
24 | C-WSL: Count-guided Weakly Supervised Localization | Mingfei Gao, Ang Li, Ruichi Yu, Vlad I. Morariu, Larry S. Davis | We introduce count-guided weakly supervised localization (C-WSL), an approach that uses per-class object count as a new form of supervision to improve weakly supervised localization (WSL). |
25 | Convolutional Networks with Adaptive Inference Graphs | Andreas Veit, Serge Belongie | In this work, we propose convolutional networks with adaptive inference graphs (ConvNet-AIG) that adaptively define their network topology conditioned on the input image. |
26 | Summarizing First-Person Videos from Third Persons’ Points of View | HSUAN-I HO, Wei-Chen Chiu, Yu-Chiang Frank Wang | With the goal of deriving an effective model to summarize first-person videos, we propose a novel deep neural network architecture for describing and discriminating vital spatiotemporal information across videos with different points of view. |
27 | Programmable Triangulation Light Curtains | Jian Wang, Joseph Bartels, William Whittaker, Aswin C. Sankaranarayanan, Srinivasa G. Narasimhan | We introduce a novel device that monitors the presence of objects on a virtual shell near the device, which we refer to as a light curtain. |
28 | Learning Single-View 3D Reconstruction with Limited Pose Supervision | Guandao Yang, Yin Cui, Serge Belongie, Bharath Hariharan | We present a unified framework that can combine both types of supervision: a small amount of camera pose annotations are used to enforce pose-invariance and view-point consistency, and unlabeled images combined with an adversarial loss are used to enforce the realism of rendered, generated models. |
29 | Maximum Margin Metric Learning Over Discriminative Nullspace for Person Re-identification | T M Feroz Ali, Subhasis Chaudhuri | In this paper we propose a novel metric learning framework called Nullspace Kernel Maximum Margin Metric Learning (NK3ML) which efficiently addresses the small sample size (SSS) problem inherent in person re-identification and offers a significant performance gain over existing state-of-the-art methods. |
30 | Snap Angle Prediction for 360° Panoramas | Bo Xiong, Kristen Grauman | To discover the relationship between these optimal emph{snap angles} and the spherical panorama’s content, we develop a reinforcement learning approach for the cubemap projection model. |
31 | Memory Aware Synapses: Learning what (not) to forget | Rahaf Aljundi, Francesca Babiloni , Mohamed Elhoseiny, Marcus Rohrbach, Tinne Tuytelaars | In this paper, we argue that, given the limited model capacity and the unlimited new information to be learned, knowledge has to be preserved or erased selectively. |
32 | Learning to Zoom: a Saliency-Based Sampling Layer for Neural Networks | Adria Recasens, Petr Kellnhofer, Simon Stent, Wojciech Matusik, Antonio Torralba | We introduce a saliency-based distortion layer for convolutional neural networks that helps to improve the spatial sampling of input data for a given task. |
33 | Weakly- and Semi-Supervised Panoptic Segmentation | Qizhu Li, Anurag Arnab, Philip H.S. Torr | We present a weakly supervised model that jointly performs both semantic- and instance-segmentation — a particularly relevant problem given the substantial cost of obtaining pixel-perfect annotation for these tasks. |
34 | K-convexity shape priors for segmentation | Hossam Isack, Lena Gorelick, Karin Ng, Olga Veksler, Yuri Boykov | As shown in the paper, for many forms of convexity our regularization model is significantly more descriptive for any given k. |
35 | Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images | Nanyang Wang, Yinda Zhang, Zhuwen Li, Yanwei Fu, Wei Liu, Yu-Gang Jiang | We propose an end-to-end deep learning architecture that produces a 3D shape in triangular mesh from a single color image. |
36 | Boosted Attention: Leveraging Human Attention for Image Captioning | Shi Chen, Qi Zhao | Inspired by the human visual system which is driven by not only the task-specific top-down signals but also the visual stimuli, we in this work propose to use both types of attention for image captioning. |
37 | Incremental Multi-graph Matching via Diversity and Randomness based Graph Clustering | Tianshu Yu, Junchi Yan, Wei Liu, Baoxin Li | In this paper, we present an incremental multi-graph matching approach, which deals with the arriving graph utilizing the previous matching results under the global consistency constraint. |
38 | Multi-view to Novel view: Synthesizing novel views with Self-Learned Confidence | Shao-Hua Sun, Minyoung Huh, Yuan-Hong Liao, Ning Zhang, Joseph J. Lim | In this paper, we address the task of multi-view novel view synthesis, where we are interested in synthesizing a target image with an arbitrary camera pose from given source images. |
39 | Making Deep Heatmaps Robust to Partial Occlusions for 3D Object Pose Estimation | Markus Oberweger, Mahdi Rad, Vincent Lepetit | We introduce a novel method for robust and accurate 3D object pose estimation from a single color image under large occlusions. |
40 | Image Inpainting for Irregular Holes Using Partial Convolutions | Guilin Liu, Fitsum A. Reda, Kevin J. Shih, Ting-Chun Wang, Andrew Tao, Bryan Catanzaro | We propose to use partial convolutions, where the convolution is masked and renormalized to be conditioned on only valid pixels. |
41 | Audio-Visual Scene Analysis with Self-Supervised Multisensory Features | Andrew Owens, Alexei A. Efros | In this paper, we argue that the visual and audio components of a video signal should be modeled jointly using a fused multisensory representation. |
42 | Fighting Fake News: Image Splice Detection via Learned Self-Consistency | Minyoung Huh, Andrew Liu, Andrew Owens, Alexei A. Efros | In this paper, we introduce a self-supervised method for learning to detect a visual manipulations using only unlabeled data. |
43 | End-to-End Joint Semantic Segmentation of Actors and Actions in Video | Jingwei Ji, Shyamal Buch, Alvaro Soto, Juan Carlos Niebles | In this work, we propose a new end-to-end architecture for tackling this task in videos. |
44 | Visual Text Correction | Amir Mazaheri, Mubarak Shah | In this research, we study a new scenario in which both the sentence and the video are given, but the sentence is inaccurate. This paper introduces a new problem, called Visual Text Correction (VTC), i.e., finding and replacing an inaccurate word in the textual description of a video. |
45 | Deep Co-Training for Semi-Supervised Image Recognition | Siyuan Qiao, Wei Shen, Zhishuai Zhang, Bo Wang, Alan Yuille | In this paper, we study the problem of semi-supervised image recognition, which is to learn classifiers using both labeled and unlabeled images. |
46 | Progressive Neural Architecture Search | Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, Kevin Murphy | We propose a new method for learning the structure of convolutional neural networks (CNNs) that is more efficient than recent state-of-the-art methods based on reinforcement learning and evolutionary algorithms. |
47 | Explainable Neural Computation via Stack Neural Module Networks | Ronghang Hu, Jacob Andreas, Trevor Darrell, Kate Saenko | In this paper, we present a novel neural modular approach that performs compositional reasoning by automatically inducing a desired sub-task decomposition without relying on strong supervision. |
48 | Attributes as Operators: Factorizing Unseen Attribute-Object Compositions | Tushar Nagarajan, Kristen Grauman | We present a new approach to modeling visual attributes. |
49 | Scalable Exemplar-based Subspace Clustering on Class-Imbalanced Data | Chong You, Chi Li, Daniel P. Robinson, Rene Vidal | This paper presents an exemplar-based subspace clustering method to tackle the problem of imbalanced and large-scale datasets. |
50 | RCAA: Relational Context-Aware Agents for Person Search | Xiaojun Chang, Po-Yao Huang, Yi-Dong Shen, Xiaodan Liang, Yi Yang, Alexander G. Hauptmann | In this paper, we address this problem by training relational context-aware agents which learn the actions to localize the target person from the gallery of whole scene images. |
51 | Product Quantization Network for Fast Image Retrieval | Tan Yu, Junsong Yuan, Chen Fang, Hailin Jin | By extending the hard assignment to soft assignment, we make it feasible to incorporate the product quantization as a layer of a convolutional neural network and propose our product quantization network. |
52 | Hand Pose Estimation via Latent 2.5D Heatmap Regression | Umar Iqbal, Pavlo Molchanov, Thomas Breuel Juergen Gall, Jan Kautz | In this paper we propose a new method for 3D hand pose estimation from a monocular image through a novel 2.5D pose representation. |
53 | Multimodal Unsupervised Image-to-image Translation | Xun Huang, Ming-Yu Liu, Serge Belongie, Jan Kautz | To address this limitation, we propose a Multimodal Unsupervised Image-to-image Translation (MUNIT) framework. |
54 | Depth-aware CNN for RGB-D Segmentation | Weiyue Wang, Ulrich Neumann | To address these issues, we present Depth-aware CNN by introducing two intuitive, flexible and effective operations: depth-aware convolution and depth-aware average pooling. |
55 | Visual Coreference Resolution in Visual Dialog using Neural Module Networks | Satwik Kottur, Jose M. F. Moura, Devi Parikh, Dhruv Batra, Marcus Rohrbach | In this work, we propose a neural module network architecture for visual dialog by introducing two novel modules—Refer and Exclude—that perform explicit, grounded, coreference resolution at a finer word level. |
56 | Learning Blind Video Temporal Consistency | Wei-Sheng Lai, Jia-Bin Huang, Oliver Wang, Eli Shechtman, Ersin Yumer, Ming-Hsuan Yang | In this paper, we present an efficient end-to-end approach based on deep recurrent network for enforcing temporal consistency in a video.Our method takes the original unprocessed and per-frame processed videos as inputs to produce a temporally consistent video.Consequently, our approach is agnostic to specific image processing algorithms applied on the original video.We train the proposed network by minimizing both short-term and long-term temporal losses as well as the perceptual loss to strike a balance between temporal stability and perceptual similarity with the processed frames.At test time, our model does not require computing optical flow and thus achieves real-time speed even for high-resolution videos. |
57 | Diverse Image-to-Image Translation via Disentangled Representations | Hsin-Ying Lee, Hung-Yu Tseng, Jia-Bin Huang, Maneesh Singh, Ming-Hsuan Yang | In this work, we present an approach based on disentangled representation for producing diverse outputs without paired training images. |
58 | Learning to Blend Photos | Wei-Chih Hung, Jianming Zhang, Xiaohui Shen, Zhe Lin, Joon-Young Lee, Ming-Hsuan Yang | To make photo blending accessible to general public, we propose an efficient approach for automatic photo blending via deep learning. |
59 | Switchable Temporal Propagation Network | Sifei Liu, Guangyu Zhong, Shalini De Mello, Jinwei Gu, Varun Jampani, Ming-Hsuan Yang, Jan Kautz | In this paper, we propose a learnable unified framework for propagating a variety of visual properties of video images, including but not limited to color, high dynamic range (HDR), and segmentation mask, where the properties are available for only a few key-frames. |
60 | Deeply Learned Compositional Models for Human Pose Estimation | Wei Tang, Pei Yu, Ying Wu | To address these issues, this paper introduces a novel framework, termed as Deeply Learned Compositional Model (DLCM), for HPE. |
61 | Unsupervised Video Object Segmentation with Motion-based Bilateral Networks | Siyang Li, Bryan Seybold, Alexey Vorobyov, Xuejing Lei, C.-C. Jay Kuo | In this work, we study the unsupervised video object segmentation problem where moving objects are segmented without prior knowledge of these objects. |
62 | CornerNet: Detecting Objects as Paired Keypoints | Hei Law, Jia Deng | We propose CornerNet, a new approach to object detection where we detect an object bounding box as a pair of keypoints, the top-left corner and the bottom-right corner, using a single convolution neural network. |
63 | Unsupervised holistic image generation from key local patches | Donghoon Lee, Sangdoo Yun, Sungjoon Choi, Hwiyeon Yoo, Ming-Hsuan Yang, Songhwai Oh | In this work, key local patches are defined as informative regions of the target object or scene. We introduce a new problem of generating an image based on a small number of key local patches without any geometric prior. |
64 | Group Normalization | Yuxin Wu, Kaiming He | In this paper, we present Group Normalization (GN) as a simple alternative to BN. |
65 | Generalizing A Person Retrieval Model Hetero- and Homogeneously | Zhun Zhong, Liang Zheng, Shaozi Li, Yi Yang | To this end, we introduce a Hetero-Homogeneous Learning (HHL) method. |
66 | CAR-Net: Clairvoyant Attentive Recurrent Network | Amir Sadeghian, Ferdinand Legros, Maxime Voisin, Ricky Vesel, Alexandre Alahi, Silvio Savarese | We present an interpretable framework for path prediction that leverages dependencies between agents’ behaviors and their spatial navigation environment. To study the impact of space on agents’ trajectories, we build a new dataset made of top-view images of hundreds of scenes (Formula One racing tracks) where agents’ behaviors are heavily influenced by known areas in the images (e.g., upcoming turns). |
67 | Cross-Modal Hamming Hashing | Yue Cao , Bin Liu, Mingsheng Long, Jianmin Wang | This work presents Cross-Modal Hamming Hashing (CMHH), a novel deep cross-modal hashing approach that generates compact and highly concentrated hash codes to enable efficient and effective Hamming space retrieval. |
68 | PlaneMatch: Patch Coplanarity Prediction for Robust RGB-D Reconstruction | Yifei Shi, Kai Xu, Matthias Niessner, Szymon Rusinkiewicz, Thomas Funkhouser | We introduce a novel RGB-D patch descriptor designed for detecting coplanar surfaces in SLAM reconstruction. |
69 | DF-Net: Unsupervised Joint Learning of Depth and Flow using Cross-Task Consistency | Yuliang Zou, Zelun Luo, Jia-Bin Huang | We present an unsupervised learning framework for simultaneously training single-view depth prediction and optical flow estimation models using unlabeled video sequences. |
70 | Distractor-aware Siamese Networks for Visual Object Tracking | Zheng Zhu, Qiang Wang, Bo Li, Wei Wu, Junjie Yan, Weiming Hu | In this paper, we focus on learning distractor-aware Siamese networks for accurate and long-term tracking. |
71 | Multiresolution Tree Networks for 3D Point Cloud Processing | Matheus Gadelha, Rui Wang, Subhransu Maji | We present multiresolution tree-structured networks to process point clouds for 3D shape understanding and generation tasks. |
72 | Propagating LSTM: 3D Pose Estimation based on Joint Interdependency | Kyoungoh Lee, Inwoong Lee, Sanghoon Lee | We present a novel 3D pose estimation method based on joint interdependency (JI) for acquiring 3D joints from the human pose of an RGB image. |
73 | Deep Video Quality Assessor: From Spatio-temporal Visual Sensitivity to A Convolutional Neural Aggregation Network | Woojae Kim, Jongyoo Kim, Sewoong Ahn, Jinwoo Kim, Sanghoon Lee | In this paper, we propose a novel full-reference (FR) VQA framework named Deep Video Quality Assessor (DeepVQA) to quantify the spatio-temporal visual perception via a convolutional neural network (CNN) and a convolutional neural aggregation network (CNAN). |
74 | Salient Objects in Clutter: Bringing Salient Object Detection to the Foreground | Deng-Ping Fan, Ming-Ming Cheng, Jiang-Jiang Liu, Shang-Hua Gao, Qibin Hou, Ali Borji | We provide a comprehensive evaluation of salient object detection (SOD) models. Then, we propose a new high-quality dataset and update the previous saliency benchmark. |
75 | Face Recognition with Contrastive Convolution | Chunrui Han, Shiguang Shan, Meina Kan, Shuzhe Wu, Xilin Chen | Inspired, we propose a novel CNN structure with what we referred to as contrastive convolution, which specifically focuses on the distinct characteristics between the two faces to compare, i.e., those contrastive characteristics. |
76 | Monocular Depth Estimation with Affinity, Vertical Pooling, and Label Enhancement | Yukang Gan, Xiangyu Xu, Wenxiu Sun, Liang Lin | In the proposed algorithm we introduce vertical pooling to aggregate image features vertically to improve the depth accuracy.Furthermore, since the Lidar depth ground truth is quite sparse, we enhance the depth labels by generating high-quality dense depth maps with off-the-shelf stereo matching method which takes left-right image pairs as input.We also integrate multi-scale structures in our network to obtain global understanding the image depth and exploit residual learning to help depth refinement.We demonstrate that the proposed algorithm performs favorably against state-of-the-art methods both qualitatively and quantitatively on the KITTI driving dataset. |
77 | Domain Adaptation through Synthesis for Unsupervised Person Re-identification | Slawomir Bak, Peter Carr, Jean-Francois Lalonde | To achieve better accuracy in unseen illumination conditions we propose a novel domain adaptation technique that takes advantage of our synthetic data and performs fine-tuning in a completely unsupervised way. To alleviate this problem, we introduce a new synthetic dataset that contains hundreds of illumination conditions. |
78 | Adding Attentiveness to the Neurons in Recurrent Neural Networks | Pengfei Zhang, Jianru Xue, Cuiling Lan, Wenjun Zeng, Zhanning Gao, Nanning Zheng | We propose adding a simple yet effective Element-wise-Attention Gate (EleAttG) to an RNN block (e.g., all RNN neurons in a network layer) that empowers the RNN neurons to have the attentiveness capability. |
79 | Neural Stereoscopic Image Style Transfer | Xinyu Gong, Haozhi Huang, Lin Ma, Fumin Shen, Wei Liu, Tong Zhang | In this paper, we propose a novel dual path network for view-consistent style transfer on stereoscopic images. |
80 | Learning Dynamic Memory Networks for Object Tracking | Tianyu Yang, Antoni B. Chan | In this paper, we propose a dynamic memory network to adapt the template to the target’s appearance variations during tracking. |
81 | Gray-box Adversarial Training | B. S. Vivek, Konda Reddy Mopuri, R. Venkatesh Babu | In this paper we, (i) demonstrate the drawbacks of existing evaluation policy, (ii) Introduce novel variants of white-box and black-box attacks, dubbed “gray-box adversarial attacks” based on which we propose novel evaluation method to assess the robustness of the learned models, and (iii) propose a novel variant of adversarial training, named “Graybox Adversarial Training” that uses intermediate versions of the models to seed the adversaries. |
82 | GeoDesc: Learning Local Descriptors by Integrating Geometry Constraints | Zixin Luo, Tianwei Shen, Lei Zhou, Siyu Zhu, Runze Zhang, Yao Yao, Tian Fang, Long Quan | In this paper, we mitigate this limitation by proposing a novel local descriptor learning approach that integrates geometry constraints from multi-view reconstructions, which benefit the learning process in data generation, data sampling and loss computation. |
83 | Unsupervised Image-to-Image Translation with Stacked Cycle-Consistent Adversarial Networks | Minjun Li, Haozhi Huang, Lin Ma, Wei Liu, Tong Zhang, Yugang Jiang | In this paper, we propose novel Stacked Cycle-Consistent Adversarial Networks (SCANs) by decomposing a single translation into multi-stage transformations, which not only boost the image translation quality but also enable higher resolution image-toimage translation in a coarse-to-fine fashion. |
84 | Light Structure from Pin Motion: Simple and Accurate Point Light Calibration for Physics-based Modeling | Hiroaki Santo, Michael Waechter, Masaki Samejima, Yusuke Sugano, Yasuyuki Matsushita | We present a practical method for geometric point light source calibration. |
85 | Find and Focus: Retrieve and Localize Video Events with Natural Language Queries | Dian Shao, Yu Xiong, Yue Zhao, Qingqiu Huang, Yu Qiao, Dahua Lin | In this work, we aim to move beyond this limitation by delving into the internal structures of both sides, the queries and the videos. |
86 | Evaluating Capability of Deep Neural Networks for Image Classification via Information Plane | Hao Cheng, Dongze Lian, Shenghua Gao, Yanlin Geng | Inspired by the pioneering work of information bottleneck principle for Deep Neural Networks (DNNs) analysis, we design an information plane based framework to evaluate the capability of DNNs for image classification tasks, which not only helps understand the capability of DNNs, but also helps us choose a neural network which leads to higher classification accuracy more efficiently. |
87 | Super-Identity Convolutional Neural Network for Face Hallucination | Kaipeng Zhang, Zhanpeng Zhang, Chia-Wen Cheng, Winston H. Hsu, Yu Qiao, Wei Liu, Tong Zhang | To overcome this challenge, we present a domain-integrated training approach by constructing a robust identity metric for faces from these two domains. |
88 | SOD-MTGAN: Small Object Detection via Multi-Task Generative Adversarial Network | Yancheng Bai, Yongqiang Zhang, Mingli Ding, Bernard Ghanem | To deal with small object detection problem, we propose an end-to-end multi-task generative adversarial network (MTGAN). |
89 | Face Super-resolution Guided by Facial Component Heatmaps | Xin Yu, Basura Fernando, Bernard Ghanem, Fatih Porikli, Richard Hartley | State-of-the-art face super-resolution methods use deep convolutional neural networks to learn a mapping between low-resolution (LR) facial patterns and their corresponding high-resolution (HR) counterparts by exploring local information. |
90 | ML-LocNet: Improving Object Localization with Multi-view Learning Network | Xiaopeng Zhang, Yang Yang, Jiashi Feng | We propose a Multi-view Learning Localization Network (ML-LocNet) by incorporating multi-view learning into a two-phase WSOL model. |
91 | Facial Expression Recognition with Inconsistently Annotated Datasets | Jiabei Zeng, Shiguang Shan, Xilin Chen | To address the inconsistency, we propose an Inconsistent Pseudo Annotations to Latent Truth(IPA2LT) framework to train a FER model from multiple inconsistently labeled datasets and large scale unlabeled data. |
92 | Visual Question Answering as a Meta Learning Task | Damien Teney, Anton van den Hengel | We propose instead to approach VQA as a meta learning task, thus separating the question answering method from the information required. |
93 | Deformable Pose Traversal Convolution for 3D Action and Gesture Recognition | Junwu Weng, Mengyuan Liu, Xudong Jiang, Junsong Yuan | Rather than directly representing the 3D pose using its joint locations, in this paper, we propose Deformable Pose Traversal Convolution which applies one-dimensional convolution to traverse the 3D pose to represent it. |
94 | Semi-Dense 3D Reconstruction with a Stereo Event Camera | Yi Zhou, Guillermo Gallego, Henri Rebecq, Laurent Kneip, Hongdong Li, Davide Scaramuzza | This paper presents a solution to the problem of 3D reconstruction from data captured by a stereo event-camera rig moving in a static scene, such as in the context of stereo Simultaneous Localization and Mapping. |
95 | What do I Annotate Next? An Empirical Study of Active Learning for Action Localization | Fabian Caba Heilbron, Joon-Young Lee, Hailin Jin, Bernard Ghanem | In this paper, we introduce a novel active learning framework for temporal localization that aims to mitigate this data dependency issue. |
96 | HybridNet: Classification and Reconstruction Cooperation for Semi-Supervised Learning | Thomas Robert, Nicolas Thome, Matthieu Cord | In this paper, we introduce a new model for leveraging unlabeled data to improve generalization performances of image classifiers: a two-branch encoder-decoder architecture called HybridNet. |
97 | Self-Calibrating Isometric Non-Rigid Structure-from-Motion | Shaifali Parashar, Adrien Bartoli, Daniel Pizarro | We present self-calibrating isometric non-rigid structure- from-motion (SCIso-NRSfM), the first method to reconstruct a non-rigid object from at least three monocular images with constant but unknown focal length. |
98 | Stroke Controllable Fast Style Transfer with Adaptive Receptive Fields | Yongcheng Jing, Yang Liu, Yezhou Yang, Zunlei Feng, Yizhou Yu, Dacheng Tao, Mingli Song | In this paper, we present a stroke controllable style transfer network that can achieve continuous and spatial stroke size control. |
99 | Reverse Attention for Salient Object Detection | Shuhan Chen, Xiuli Tan, Ben Wang, Xuelong Hu | To this end, this paper presents an accurate yet compact deep network for efficient salient object detection. |
100 | Action Search: Spotting Actions in Videos and Its Application to Temporal Action Localization | Humam Alwassel, Fabian Caba Heilbron, Bernard Ghanem | To address this need, we propose the new problem of action spotting in video, which we define as finding a specific action in a video while observing a small portion of that video. |
101 | Diagnosing Error in Temporal Action Detectors | Humam Alwassel, Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem | To this end, we introduce a new diagnostic tool to analyze the performance of temporal action detectors in videos and compare different methods beyond a single scalar metric. |
102 | Unsupervised Geometry-Aware Representation for 3D Human Pose Estimation | Helge Rhodin, Mathieu Salzmann, Pascal Fua | In this paper, we propose to overcome this problem by learning a geometry-aware body representation from multi-view images without annotations. |
103 | Massively Parallel Video Networks | Joao Carreira, Viorica Patraucean, Laurent Mazare, Andrew Zisserman, Simon Osindero | We introduce a class of causal video understanding models that aims to improve efficiency of video processing by maximising throughput, minimising latency, and reducing the number of clock cycles. |
104 | Transductive Centroid Projection for Semi-supervised Large-scale Recognition | Yu Liu, Guanglu Song, Jing Shao, Xiao Jin, Xiaogang Wang | Specifically, we design the TCP module by dynamically adding an extit{ad hoc anchor} for each cluster in one mini-batch. |
105 | PSANet: Point-wise Spatial Attention Network for Scene Parsing | Hengshuang Zhao, Yi Zhang, Shu Liu, Jianping Shi, Chen Change Loy, Dahua Lin, Jiaya Jia | In this paper, we propose the point-wise spatial attention network (PSANet) to relax the local neighborhood constraint. |
106 | Robust Anchor Embedding for Unsupervised Video Person Re-Identification in the Wild | Mang Ye, Xiangyuan Lan, Pong C. Yuen | To achieve it, we propose a novel Robust AnChor Embedding (RACE) framework via deep feature representation learning for large-scale unsupervised video re-ID. |
107 | Semi-Supervised Deep Learning with Memory | Yanbei Chen, Xiatian Zhu, Shaogang Gong | In this work, we propose a novel Memory-Assisted Deep Neural Network (MA-DNN) capable of exploiting the memory of model learning to enable semi-supervised learning. |
108 | Towards End-to-End License Plate Detection and Recognition: A Large Dataset and Baseline | Zhenbo Xu, Wei Yang, Ajin Meng, Nanxue Lu, Huan Huang, Changchun Ying, Liusheng Huang | In this paper, we introduce CCPD, a large and comprehensive LP dataset. |
109 | Repeatability Is Not Enough: Learning Affine Regions via Discriminability | Dmytro Mishkin, Filip Radenovic, Jiri Matas | A method for learning local affine-covariant regions is presented. |
110 | Learning Warped Guidance for Blind Face Restoration | Xiaoming Li, Ming Liu, Yuting Ye, Wangmeng Zuo, Liang Lin, Ruigang Yang | For better recovery of fine facial details, we modify the problem setting by taking both the degraded observation and a high-quality guided image of the same identity as input to our guided face restoration network (GFRNet). |
111 | Compressing the Input for CNNs with the First-Order Scattering Transform | Edouard Oyallon, Eugene Belilovsky, Sergey Zagoruyko, Michal Valko | We subsequently apply our Imagenet trained hybrid model as a base model on a detection system, which typically has larger image inputs. |
112 | Face De-Spoofing: Anti-Spoofing via Noise Modeling | Amin Jourabloo, Yaojie Liu, Xiaoming Liu | In this work, motivated by the noise modeling and denoising algorithms, we identify a new problem of face de-spoofing, for the purpose of anti-spoofing: inversely decomposing a spoof face into a spoof noise and a live face, and then utilizing the spoof noise for classification. |
113 | Faces as Lighting Probes via Unsupervised Deep Highlight Extraction | Renjiao Yi, Chenyang Zhu, Ping Tan, Stephen Lin | We present a method for estimating detailed scene illumination using human faces in a single image. |
114 | Unsupervised Hard Example Mining from Videos for Improved Object Detection | SouYoung Jin, Aruni RoyChowdhury, Huaizu Jiang, Ashish Singh, Aditya Prasad, Deep Chakraborty, Erik Learned-Miller | In this work, we show how large numbers of hard negatives can be obtained {em automatically} by analyzing the output of a trained detector on video sequences. |
115 | On Offline Evaluation of Vision-based Driving Models | Felipe Codevilla, Antonio M. Lopez, Vladlen Koltun, Alexey Dosovitskiy | In this paper, we investigate the relation between various online and offline metrics for evaluation of autonomous driving models. |
116 | Deep Fundamental Matrix Estimation | Rene Ranftl, Vladlen Koltun | We present an approach to robust estimation of fundamental matrices from noisy data contaminated by outliers. |
117 | ContextVP: Fully Context-Aware Video Prediction | Wonmin Byeon, Qin Wang, Rupesh Kumar Srivastava, Petros Koumoutsakos | To address this issue, we introduce a fully context-aware architecture that captures the entire available past context for each pixel using Parallel Multi-Dimensional LSTM units and aggregates it using blending units. |
118 | Visual Psychophysics for Making Face Recognition Algorithms More Explainable | Brandon RichardWebster, So Yon Kwon, Christopher Clarizio, Samuel E. Anthony, Walter J. Scheirer | In this paper, we suggest that visual psychophysics is a viable methodology for making face recognition algorithms more explainable. |
119 | TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild | Matthias Muller, Adel Bibi, Silvio Giancola, Salman Alsubaihi, Bernard Ghanem | In this work, we present TrackingNet, the first large-scale dataset and benchmark for object tracking in the wild. In addition, we introduce a new benchmark composed of 500 novel videos, modeled with a distribution similar to our training dataset. |
120 | Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image | Siyuan Huang, Siyuan Qi, Yixin Zhu, Yinxue Xiao, Yuanlu Xu, Song-Chun Zhu | We propose a computational framework to jointly parse a single RGB image and reconstruct a holistic 3D configuration composed by a set of CAD models using a stochastic grammar model. |
121 | Semi-supervised Adversarial Learning to Generate Photorealistic Face Images of New Identities from 3D Morphable Model | Baris Gecer, Binod Bhattarai, Josef Kittler, Tae-Kyun Kim | We propose a novel end-to-end semi-supervised adversarial framework to generate photorealistic face images of new identities with a wide range of expressions, poses, and illuminations conditioned by synthetic images sampled from a 3D morphable model. |
122 | Improved Structure from Motion Using Fiducial Marker Matching | Joseph DeGol, Timothy Bretl, Derek Hoiem | In this paper, we present an incremental structure from motion (SfM) algorithm that significantly outperforms existing algorithms when fiducial markers are present in the scene, and that matches the performance of existing algorithms when no markers are present. To validate our algorithm, we introduce a new dataset with 16 image collections of large indoor scenes with challenging characteristics (e.g., blank hallways, glass facades, brick walls) and with markers placed throughout. |
123 | Conditional Prior Networks for Optical Flow | Yanchao Yang, Stefano Soatto | We introduce a novel architecture, called Conditional Prior Network (CPN), and show how to train it to yield a conditional prior. |
124 | Unsupervised Domain Adaptation for Semantic Segmentation via Class-Balanced Self-Training | Yang Zou, Zhiding Yu, B.V.K. Vijaya Kumar, Jinsong Wang | In this paper, we propose a novel UDA frameworkbased on an iterative self-training (ST) procedure, where the problemis formulated as latent variable loss minimization, and can be solved byalternatively generating pseudo labels on target data and re-training themodel with these labels. |
125 | DetNet: Design Backbone for Object Detection | Zeming Li, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng, Jian Sun | Due to the gap between the image classification and object detection, we propose DetNet in this paper, which is a novel backbone network specifically designed for object detection. |
126 | BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation | Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang | In this paper, we address this dilemma with a novel Bilateral Segmentation Network (BiSeNet). |
127 | HairNet: Single-View Hair Reconstruction using Convolutional Neural Networks | Yi Zhou, Liwen Hu, Jun Xing, Weikai Chen, Han-Wei Kung, Xin Tong, Hao Li | We introduce a deep learning-based method to generate full 3D hair geometry from an unconstrained image. |
128 | Neural Network Encapsulation | Hongyang Li, Xiaoyang Guo, Bo DaiWanli Ouyang, Xiaogang Wang | To resolve this limitation, we approximate the routing process with two branches: a master branch which collects primary information from its direct contact in the lower layer and an aide branch that replenishes master based on pattern variants encoded in other lower capsules. |
129 | StarMap for Category-Agnostic Keypoint and Viewpoint Estimation | Xingyi Zhou, Arjun Karpur, Linjie Luo, Qixing Huang | We propose a category-agnostic keypoint representation, which combines a multi-peak heatmap (StarMap) for all the keypoints and their corresponding features as 3D locations in the canonical viewpoint (CanViewFeature) defined for each instance. |
130 | Factorizable Net: An Efficient Subgraph-based Framework for Scene Graph Generation | Yikang Li, Wanli Ouyang, Bolei Zhou, Jianping Shi, Chao Zhang, Xiaogang Wang | To improve the efficiency of scene graph generation, we propose a subgraph-based connection graph to concisely represent the scene graph during the inference. |
131 | Multi-Fiber Networks for Video Recognition | Yunpeng Chen, Yannis Kalantidis, Jianshu Li, Shuicheng Yan, Jiashi Feng | In this paper, we aim to reduce the computational cost of spatio-temporal deep neural networks, making them run as fast as their 2D counterparts while preserving state-of-the-art accuracy on video recognition benchmarks. |
132 | Towards Human-Level License Plate Recognition | Jiafan Zhuang, Saihui Hou, Zilei Wang, Zheng-Jun Zha | In this paper, we propose a novel LPR framework consisting of semantic segmentation and character counting, towards achieving human-level performance. |
133 | Zoom-Net: Mining Deep Feature Interactions for Visual Relationship Recognition | Guojun Yin, Lu Sheng, Bin Liu, Nenghai Yu, Xiaogang Wang, Jing Shao, Chen Change Loy | Zoom-Net: Mining Deep Feature Interactions for Visual Relationship Recognition |
134 | Generalized Loss-Sensitive Adversarial Learning with Manifold Margins | Marzieh Edraki, Guo-Jun Qi | Thus, we define a pullback operator to map samples back to their data manifold, and a manifold margin is defined as the distance between the pullback representations to distinguish between real and fake sam- ples and learn the optimal generators. |
135 | Pose Proposal Networks | Taiki Sekii | We propose a novel method to detect an unknown number of articulated 2D poses in real time. |
136 | Less is More: Picking Informative Frames for Video Captioning | Yangyu Chen, Shuhui Wang, Weigang Zhang, Qingming Huang | We propose a plug-and-play PickNet to perform informative frame picking in video captioning. |
137 | Robust Optical Flow in Rainy Scenes | Ruoteng Li, Robby T. Tan, Loong-Fah Cheong | To resolve the problem, we introduce a residue channel, a single channel (gray) image that is free from rain, and its colored version, a colored-residue image. We also provide an optical flow dataset consisting of both synthetic and real rain images. |
138 | Into the Twilight Zone: Depth Estimation using Joint Structure-Stereo Optimization | Aashish Sharma, Loong-Fah Cheong | We present a joint Structure-Stereo optimization model that is robust for disparity estimation under low-light conditions. |
139 | Structured Siamese Network for Real-Time Visual Tracking | Yunhua Zhang, Lijun Wang, Jinqing Qi, Dong Wang, Mengyang Feng, Huchuan Lu | In this paper, we circumvent this issue by proposing a local structure learning method, which simultaneously considers the local patterns of the target and their structural relationships for more accurate target tracking. |
140 | Associating Inter-Image Salient Instances for Weakly Supervised Semantic Segmentation | Ruochen Fan, Qibin Hou, Ming-Ming Cheng, Gang Yu, Ralph R. Martin, Shi-Min Hu | In this paper, we use an instance-level salient object detector to automatically generate salient instances (candidate objects) for training images. |
141 | Learning Deep Representations with Probabilistic Knowledge Transfer | Nikolaos Passalis, Anastasios Tefas | In this paper we propose a novel probabilistic knowledge transfer method that works by matching the probability distribution of the data in the feature space instead of their actual representation. |
142 | Recycle-GAN: Unsupervised Video Retargeting | Aayush Bansal, Shugao Ma, Deva Ramanan, Yaser Sheikh | We introduce a data-driven approach for unsupervised video retargeting that translates content from one domain to another while preserving the style native to a domain, i.e., if contents of John Oliver’s speech were to be transferred to Stephen Colbert, then the generated content/speech should be in Stephen Colbert’s style. |
143 | Escaping from Collapsing Modes in a Constrained Space | Chia-Che Chang, Chieh Hubert Lin, Che-Rung Lee, Da-Cheng Juan, Wei Wei, Hwann-Tzong Chen | We propose a new model, called BEGAN with a Constrained Space (BEGAN-CS), which includes a latent-space constraint in the loss function. |
144 | Integrating Egocentric Videos in Top-view Surveillance Videos: Joint Identification and Temporal Alignment | Shervin Ardeshir, Ali Borji | In this paper, we aim to relate these two sources of information from a surveillance standpoint, namely in terms of identification and temporal alignment. |
145 | Cross-Modal and Hierarchical Modeling of Video and Text | Bowen Zhang, Hexiang Hu, Fei Sha | In this paper, we investigate the modeling techniques for such hierarchical sequential data where there are correspondences across multiple modalities. |
146 | Tackling 3D ToF Artifacts Through Learning and the FLAT Dataset | Qi Guo, Iuri Frosio, Orazio Gallo, Todd Zickler, Jan Kautz | We propose a two-stage, deep-learning approach to address all of these sources of artifacts simultaneously. |
147 | Visual-Inertial Object Detection and Mapping | Xiaohan Fei, Stefano Soatto | We present a method to populate an unknown environment with models of previously seen objects, placed in a Euclidean reference frame that is inferred causally and on-line using monocular video along with inertial sensors. We test our algorithm on existing datasets, and also introduce the VISMA dataset, that provides ground truth pose, point-cloud map, and object models, along with time-stamped inertial measurements. |
148 | Zero-Shot Object Detection | Ankan Bansal, Karan Sikka, Gaurav Sharma, Rama Chellappa, Ajay Divakaran | We present a principled approach by first adapting visual-semantic embeddings for ZSD. We introduce and tackle the problem of zero-shot object detection (ZSD), which aims to detect object classes which are not observed during training. |
149 | Tracking Emerges by Colorizing Videos | Carl Vondrick, Abhinav Shrivastava, Alireza Fathi, Sergio Guadarrama, Kevin Murphy | We use large amounts of unlabeled video to learn models for visual tracking without manual human supervision. |
150 | Actor-centric Relation Network | Chen Sun, Abhinav Shrivastava, Carl Vondrick, Kevin Murphy, Rahul Sukthankar, Cordelia Schmid | We show that ACRN outperforms alternative approaches which capture relation information, and that the proposed framework improves upon the state-of-the-art performance on JHMDB and AVA. |
151 | Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification | Saining Xie, Chen Sun, Jonathan Huang, Zhuowen Tu, Kevin Murphy | We seek a balance between speed and accuracy by building an effective and efficient video classification system through systematic exploration of critical network design choices. |
152 | SkipNet: Learning Dynamic Routing in Convolutional Networks | Xin Wang, Fisher Yu, Zi-Yi Dou, Trevor Darrell, Joseph E. Gonzalez | We formulate the dynamic skipping problem in the context of sequential decision making and propose a hybrid learning algorithm that combines supervised learning and reinforcement learning to address the challenges of non-differentiable skipping decisions. |
153 | Quantized Densely Connected U-Nets for Efficient Landmark Localization | Zhiqiang Tang, Xi Peng, Shijie Geng, Lingfei Wu, Shaoting Zhang, Dimitris Metaxas | In this paper, we propose quantized densely connected U-Nets for efficient visual landmark localization. |
154 | Person Search in Videos with One Portrait Through Visual and Temporal Links | Qingqiu Huang, Wentao Liu, Dahua Lin | In this paper, we aim to tackle this challenge and propose a novel framework, which takes into account the identity invariance along a tracklet, thus allowing person identities to be propagated via both the visual and the temporal links. To promote the study of person search, we construct a large-scale benchmark, which contains 127K manually annotated tracklets from 192 movies. |
155 | HybridFusion: Real-Time Performance Capture Using a Single Depth Sensor and Sparse IMUs | Zerong Zheng, Tao Yu, Hao Li, Kaiwen Guo, Qionghai Dai, Lu Fang, Yebin Liu | We propose a light-weight and highly robust real-time human performance capture method based on a single depth camera and sparse inertial measurement units (IMUs). |
156 | Variational Wasserstein Clustering | Liang Mi, Wen Zhang, Xianfeng Gu, Yalin Wang | We propose a new clustering method based on optimal transportation. |
157 | A Modulation Module for Multi-task Learning with Applications in Image Retrieval | Xiangyun Zhao, Haoxiang Li, Xiaohui Shen, Xiaodan Liang, Ying Wu | To address the this problem, we propose a general modulation module, which can be inserted into any convolutional neural network architecture, to encourage the coupling and feature sharing of relevant tasks while disentangling the learning of irrelevant tasks with minor parameters addition. |
158 | Learning Human-Object Interactions by Graph Parsing Neural Networks | Siyuan Qi, Wenguan Wang, Baoxiong Jia, Jianbing Shen, Song-Chun Zhu | We introduce the Graph Parsing Neural Network (GPNN), a framework that incorporates structural knowledge while being differentiable end-to-end. |
159 | Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data | Xihui Liu, Hongsheng Li, Jing Shao, Dapeng Chen, Xiaogang Wang | In this work, we propose an image captioning framework with a self-retrieval module as training guidance, which encourages generating discriminative captions. |
160 | Decouple Learning for Parameterized Image Operators | Qingnan Fan, Dongdong Chen, Lu Yuan, Gang Hua, Nenghai Yu, Baoquan Chen | To overcome this limitation, we propose a new decouple learning algorithm to learn from the operator parameters to dynamically adjust the weights of a deep network for image operators, denoted as the base network. |
161 | Grassmann Pooling as Compact Homogeneous Bilinear Pooling for Fine-Grained Visual Classification | Xing Wei, Yue Zhang, Yihong Gong, Jiawei Zhang, Nanning Zheng | Motivated by this point, we advocate an alternative pooling method which transforms the CNN feature matrix to an orthonormal matrix consists of its principal singular vectors. |
162 | Liquid Pouring Monitoring via Rich Sensory Inputs | Tz-Ying Wu, Juan-Ting Lin, Tsun-Hsuang Wang, Chan-Wei Hu, Juan Carlos Niebles, Min Sun | In this work, we take liquid pouring as a concrete example and aim at learning to continuously monitor whether liquid pouring is successful (e.g., no spilling) or not via rich sensory inputs. |
163 | Leveraging Motion Priors in Videos for Improving Human Segmentation | Yu-Ting Chen, Wen-Yen Chang, Hai-Lun Lu, Tingfan Wu, Min Sun | In this work, we propose to leverage “motion prior” in videos for improving human segmentation in a weakly-supervised active learning setting. |
164 | Triplet Loss in Siamese Network for Object Tracking | Xingping Dong, Jianbing Shen | In this paper, a novel triplet loss is proposed to extract expressive deep feature for object tracking by adding it into Siamese network framework instead of pairwise loss for training. |
165 | Macro-Micro Adversarial Network for Human Parsing | Yawei Luo, Zhedong Zheng, Liang Zheng, Tao Guan, Junqing Yu, Yi Yang | To address the two kinds of inconsistencies, this paper proposes the Macro-Micro Adversarial Net (MMAN). |
166 | Contour Knowledge Transfer for Salient Object Detection | Xin Li, Fan Yang, Hong Cheng, Wei Liu, Dinggang Shen | Our goal is to overcome this limitation by automatically converting an existing deep contour detection model into a salient object detection model without using any manual salient object masks. |
167 | Point-to-Point Regression PointNet for 3D Hand Pose Estimation | Liuhao Ge, Zhou Ren, Junsong Yuan | Convolutional Neural Networks (CNNs)-based methods for 3D hand pose estimation with depth cameras usually take 2D depth images as input and directly regress holistic 3D hand pose. |
168 | Fine-grained Video Categorization with Redundancy Reduction Attention | Chen Zhu, Xiao Tan, Feng Zhou, Xiao Liu, Kaiyu Yue, Errui Ding, Yi Ma | In this paper, we propose a new network structure, known as Redundancy Reduction Attention (RRA), which learns to focus on multiple discriminative patterns by suppressing redundant feature channels. Furthermore, we have collected two large-scale video datasets, YouTube-Birds and YouTube-Cars, for future researches on fine-grained video categorization. |
169 | Analyzing Clothing Layer Deformation Statistics of 3D Human Motions | Jinlong Yang, Jean-Sebastien Franco, Franck Hetroy-Wheeler, Stefanie Wuhrer | To this purpose we propose a comprehensive analysis of the statistics of this layer with a simple two-component model, based on PCA subspace reduction of the layer information on one hand, and a generic parameter regression model using neural networks on the other hand, designed to regress from any semantic parameter whose variation is observed in a training set, to the layer parameterization space. |
170 | DOCK: Detecting Objects by transferring Common-sense Knowledge | Krishna Kumar Singh, Santosh Divvala, Ali Farhadi, Yong Jae Lee | We present a scalable approach for Detecting Objects by transferring Common-sense Knowledge (DOCK) from source to target categories. |
171 | Recurrent Squeeze-and-Excitation Context Aggregation Net for Single Image Deraining | Xia Li, Jianlong Wu, Zhouchen Lin, Hong Liu, Hongbin Zha | We propose a novel deep network architecture based on deep convolutional and recurrent neural networks for single image deraining. |
172 | Multi-Scale Spatially-Asymmetric Recalibration for Image Classification | Yan Wang, Lingxi Xie, Siyuan Qiao, Ya Zhang, Wenjun Zhang, Alan L. Yuille | This paper addresses this issue by a recalibration process, which refers to the surrounding region of each neuron, computes an importance value and multiplies it to the original neural response. |
173 | Fast and Accurate Intrinsic Symmetry Detection | Rajendra Nagar, Shanmuganathan Raman | In this work, we detect the intrinsic reflective symmetry in triangle meshes where we have to find the intrinsically symmetric point for each point of the shape. |
174 | Open Set Domain Adaptation by Backpropagation | Kuniaki Saito, Shohei Yamamoto, Yoshitaka Ushiku, Tatsuya Harada | In this paper, we propose a method for an open set domain adaptation scenario, which utilizes adversarial training. |
175 | Choose Your Neuron: Incorporating Domain Knowledge through Neuron-Importance | Ramprasaath R. Selvaraju, Prithvijit Chattopadhyay, Mohamed Elhoseiny, Tilak Sharma, Dhruv Batra, Devi Parikh, Stefan Lee | In this work we introduce a simple, efficient zero-shot learning approach based on this observation. |
176 | CGIntrinsics: Better Intrinsic Image Decomposition through Physically-Based Rendering | Zhengqi Li, Noah Snavely | To that end, we present CGINTRINSICS, a new, large-scale dataset of physically-based rendered images of scenes with full ground truth decompositions. |
177 | Stereo Computation for a Single Mixture Image | Yiran Zhong, Yuchao Dai, Hongdong Li | In this work we give a novel deep-learning based solution, by jointly solving the two subtasks of image layer separation as well as stereo matching. This paper proposes an original problem of emph{stereo computation from a single (additive) mixture image}– a challenging problem that had not been researched before. |
178 | Objects that Sound | Relja Arandjelovic, Andrew Zisserman | In this paper our objectives are, first, networks that can embed audio and visual inputs into a common space that is suitable for cross-modal retrieval; and second, a network that can localize the object that sounds in an image, given the audio signal. |
179 | Iterative Crowd Counting | Viresh Ranjan, Hieu Le, Minh Hoai | In this work, we tackle the problem of crowd counting in images. |
180 | Weakly Supervised Region Proposal Network and Object Detection | Peng Tang, Xinggang Wang, Angtian Wang, Yongluan Yan, Wenyu Liu, Junzhou Huang, Alan Yuille | In this paper, we propose a weakly supervised region proposal network which is trained using only image-level annotations. |
181 | Image Super-Resolution Using Very Deep Residual Channel Attention Networks | Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bineng Zhong, Yun Fu | To solve these problems, we propose the very deep residual channel attention networks (RCAN). |
182 | Dividing and Aggregating Network for Multi-view Action Recognition | Dongang Wang, Wanli Ouyang, Wen Li, Dong Xu | In this paper, we propose a new Dividing and Aggregating Network (DA-Net) for multi-view action recognition. |
183 | Layer-structured 3D Scene Inference via View Synthesis | Shubham Tulsiani, Richard Tucker, Noah Snavely | We present an approach to infer a layer-structured 3D representation of a scene from a single input image. |
184 | Deblurring Natural Image Using Super-Gaussian Fields | Yuhang Liu, Wenyong Dong, Dong Gong, Lei Zhang, Qinfeng Shi | To address the above issues, we present a novel image prior for image deblurring based on a Super-Gaussian field model with adaptive structures. |
185 | Learning Category-Specific Mesh Reconstruction from Image Collections | Angjoo Kanazawa, Shubham Tulsiani, Alexei A. Efros, Jitendra Malik | We present a learning framework for recovering the 3D shape, camera, and texture of an object from a single image. |
186 | Selective Zero-Shot Classification with Augmented Attributes | Jie Song, Chengchao Shen, Jie Lei, An-Xiang Zeng, Kairi Ou, Dacheng Tao, Mingli Song | In this paper, we introduce a selective zero-shot classification problem: how can the classifier avoid making dubious predictions? |
187 | Real-time ‘Actor-Critic’ Tracking | Boyu Chen, Dong Wang, Peixia Li, Shuang Wang, Huchuan Lu | In this work, we propose a novel tracking algorithm with real-time performance based on the ‘Actor-Critic’ framework. |
188 | Zero-Annotation Object Detection with Web Knowledge Transfer | Qingyi Tao, Hao Yang, Jianfei Cai | On the contrary, we propose an object detection method that does not require any form of human annotation on target tasks, by exploiting freely available web images. |
189 | Question-Guided Hybrid Convolution for Visual Question Answering | Peng Gao, Hongsheng Li, Shuang Li, Pan Lu, Yikang Li, Steven C.H. Hoi, Xiaogang Wang | In this paper, we propose a novel Question-Guided Hybrid Convolution (QGHC) network for Visual Question Answering (VQA). |
190 | Fully Motion-Aware Network for Video Object Detection | Shiyao Wang, Yucong Zhou, Junjie Yan, Zhidong Deng | In this paper, we propose an end-to-end model called fully motion-aware network (MANet), which jointly calibrates the features of objects on both pixel-level and instance-level in a unified framework. |
191 | Learning to Forecast and Refine Residual Motion for Image-to-Video Generation | Long Zhao, Xi Peng, Yu Tian, Mubbasir Kapadia, Dimitris Metaxas | We combine the benefits of both approaches and propose a two-stage generation framework where videos are generated from structures and then refined by temporal signals. |
192 | Geometric Constrained Joint Lane Segmentation and Lane Boundary Detection | Jie Zhang, Yi Xu, Bingbing Ni, Zhenyu Duan | In this paper, we establish a multiple-task learning framework to segment lane areas and detect lane boundaries simultaneously. |
193 | Deterministic Consensus Maximization with Biconvex Programming | Zhipeng Cai, Tat-Jun Chin, Huu Le, David Suter | In this paper, we propose an efficient deterministic optimization algorithm for consensus maximization. |
194 | Lifting Layers: Analysis and Applications | Peter Ochs, Tim Meinhardt, Laura Leal-Taixe, Michael Moeller | In this paper we propose a novel non-linear transfer function, called lifting, which is motivated from a related technique in convex optimization. |
195 | Simultaneous Edge Alignment and Learning | Zhiding Yu, Weiyang Liu, Yang Zou, Chen Feng, Srikumar Ramalingam, B. V. K. Vijaya Kumar, Jan Kautz | In this paper, we show that label misalignment can cause considerably degraded edge learning quality, and address this issue by proposing a simultaneous edge alignment and learning framework. |
196 | Deep Feature Pyramid Reconfiguration for Object Detection | Tao Kong, Fuchun Sun, Chuanqi Tan, Huaping Liu, Wenbing Huang | In this paper, we begin by investigating current feature pyramids solutions, and then reformulate the feature pyramid construction as the feature reconfiguration process. |
197 | Unpaired Image Captioning by Language Pivoting | Jiuxiang Gu, Shafiq Joty, Jianfei Cai, Gang Wang | We present an approach to this unpaired image captioning problem by language pivoting. |
198 | Goal-Oriented Visual Question Generation via Intermediate Rewards | Junjie Zhang, Qi Wu, Chunhua Shen, Jian Zhang, Jianfeng Lu, Anton van den Hengel | Towards this end, we propose a Deep Reinforcement Learning framework based on three new intermediate rewards, namely goal-achieved, progressive and informativeness that encourage the generation of succinct questions, which in turn uncover valuable information towards the overall goal. |
199 | Modeling Varying Camera-IMU Time Offset in Optimization-Based Visual-Inertial Odometry | Yonggen Ling, Linchao Bao, Zequn Jie, Fengming Zhu, Ziyang Li, Shanmin Tang, Yongsheng Liu, Wei Liu, Tong Zhang | In this work, we propose a nonlinear optimization-based monocular visual inertial odometry (VIO) with varying camera-IMU time offset modeled as an unknown variable. |
200 | Teaching Machines to Understand Baseball Games: Large-Scale Baseball Video Database for Multiple Video Understanding Tasks | Minho Shim, Young Hwi Kim, Kyungmin Kim, Seon Joo Kim | To this end, we introduce a new large-scale baseball video dataset called the BBDB, which is produced semi-automatically by using play-by-play texts available online. |
201 | Receptive Field Block Net for Accurate and Fast Object Detection | Songtao Liu, Di Huang, andYunhong Wang | In this paper, we explore an alternative to build a fast and accurate detector by strengthening lightweight features using a hand-crafted mechanism. |
202 | DeepGUM: Learning Deep Robust Regression with a Gaussian-Uniform Mixture Model | Stephane Lathuiliere, Pablo Mesejo, Xavier Alameda-Pineda, Radu Horaud | In this paper we address the problem of how to robustly train a ConvNet for regression, or deep robust regression. |
203 | Deep Bilinear Learning for RGB-D Action Recognition | Jian-Fang Hu, Wei-Shi Zheng, Jiahui Pan, Jianhuang Lai, Jianguo Zhang | In this paper, we focus on exploring modality-temporal mutual information for RGB-D action recognition. |
204 | RelocNet: Continuous Metric Learning Relocalisation using Neural Nets | Vassileios Balntas, Shuda Li, Victor Prisacariu | We propose a method of learning suitable convolutional representations for camera pose retrieval based on nearest neighbour matching and continuous metric learning-based feature descriptors. |
205 | Generative Semantic Manipulation with Mask-Contrasting GAN | Xiaodan Liang, Hao Zhang, Liang Lin, Eric Xing | In this work, we focus on a more challenging semantic manipulation task, aiming at modifying the semantic meaning of an object while preserving its own characteristics (e.g. viewpoints and shapes), such as cow$ ightarrow$sheep, motor$ ightarrow$ bicycle, cat$ ightarrow$dog. |
206 | Interpolating Convolutional Neural Networks Using Batch Normalization | Gratianus Wesley Putra Data, Kirjon Ngu, David William Murray, Victor Adrian Prisacariu | Inspired by recent work on universal representations for neural networks, we propose a simple emulation of this mechanism by purposing batch normalization layers to discriminate visual classes, and formulating a way to combine them to solve new tasks. |
207 | SketchyScene: Richly-Annotated Scene Sketches | Changqing Zou, Qian Yu, Ruofei Du, Haoran Mo, Yi-Zhe Song, Tao Xiang, Chengying Gao, Baoquan Chen, Hao Zhang | We contribute the rst large-scale dataset of scene sketches, SketchyScene, with the goal of advancing research on sketch understanding at both the object and scene level. |
208 | An Adversarial Approach to Hard Triplet Generation | Yiru Zhao, Zhongming Jin, Guo-jun Qi, Hongtao Lu, Xian-sheng Hua | For this purpose, we propose an adversarial network for Hard Triplet Generation (HTG) to optimize the network ability in distinguishing similar examples of different categories as well as grouping varied examples of the same categories. |
209 | Toward Characteristic-Preserving Image-based Virtual Try-On Network | Bochao Wang, Huabin Zheng, Xiaodan Liang, Yimin Chen, Liang Lin, Meng Yang | In this work, we propose a new fully-learnable Characteristic-Preserving Virtual Try-On Network(CP-VTON) for addressing all real-world challenges in this task. |
210 | Estimating the Success of Unsupervised Image to Image Translation | Sagie Benaim, Tomer Galanti, Lior Wolf | We propose a novel bound for predicting the success of unsupervised cross domain mapping methods, which is motivated by the recently proposed simplicity hypothesis. |
211 | SphereNet: Learning Spherical Representations for Detection and Classification in Omnidirectional Images | Benjamin Coors, Alexandru Paul Condurache, Andreas Geiger | In this work, we present SphereNet, a novel deep learning framework which encodes invariance against such distortions explicitly into convolutional neural networks. |
212 | Efficient Uncertainty Estimation for Semantic Segmentation in Videos | Po-Yu Huang, Wan-Ting Hsu, Chun-Yueh Chiu, Ting-Fan Wu, Min Sun | In this work, we propose the region-based temporal aggregation (RTA) method which leverages the temporal information in videos to simulate the sampling procedure. |
213 | Deep Cross-modality Adaptation via Semantics Preserving Adversarial Learning for Sketch-based 3D Shape Retrieval | Jiaxin Chen, Yi Fang | To address this problem, we propose a novel framework to learn a discriminative deep cross-modality adaptation model in this paper. |
214 | Deep Adversarial Attention Alignment for Unsupervised Domain Adaptation: the Benefit of Target Expectation Maximization | Guoliang Kang, Liang Zheng, Yan Yan, Yi Yang | In this paper, we make two contributions to unsupervised domain adaptation (UDA) using the convolutional neural network (CNN). |
215 | ICNet for Real-Time Semantic Segmentation on High-Resolution Images | Hengshuang Zhao, Xiaojuan Qi, Xiaoyong Shen, Jianping Shi, Jiaya Jia | We focus on the challenging task of real-time semantic segmentation in this paper. |
216 | Parallel Feature Pyramid Network for Object Detection | Seung-Wook Kim, Hyong-Keun Kook, Jee-Young Sun, Mun-Cheon Kang, Sung-Jea Ko | To overcome this limitation, we propose a CNN-based object detection architecture, referred to as a parallel feature pyramid (FP) network (PFPNet), where the FP is constructed by widening the network width instead of increasing the network depth. |
217 | MultiPoseNet: Fast Multi-Person Pose Estimation using Pose Residual Network | Muhammed Kocabas, Salih Karagoz, Emre Akbas | In this paper, we present MultiPoseNet, a novel bottom-up multi-person pose estimation architecture that combines a multi-task model with a novel assignment method. |
218 | Deep Directional Statistics: Pose Estimation with Uncertainty Quantification | Sergey Prokudin, Peter Gehler , Sebastian Nowozin | In this paper, we propose a novel probabilistic deep learning model for the task of angular regression. |
219 | Person Search by Multi-Scale Matching | Xu Lan , Xiatian Zhu , Shaogang Gong | In this work, we address this multi-scale person search challenge by proposing a Cross-Level Semantic Alignment (CLSA) deep learning approach capable of learning more discriminative identity feature representations in a unified end-to-end model. |
220 | Learn-to-Score: Efficient 3D Scene Exploration by Predicting View Utility | Benjamin Hepp, Debadeepta Dey, Sudipta N. Sinha, Ashish Kapoor, Neel Joshi, Otmar Hilliges | We propose to learn a better utility function that predicts the usefulness of future viewpoints. |
221 | Joint Representation and Truncated Inference Learning for Correlation Filter based Tracking | Yingjie Yao, Xiaohe Wu, Lei Zhang, Shiguang Shan, Wangmeng Zuo | In this paper, we investigate the joint learning of deep representation and model adaptation, where an updater network is introduced for better tracking on future frame by taking current frame representation, tracking result, and last CF tracker as input. |
222 | TS2C: Tight Box Mining with Surrounding Segmentation Context for Weakly Supervised Object Detection | Yunchao Wei, Zhiqiang Shen, Bowen Cheng, Honghui Shi, Jinjun Xiong, Jiashi Feng, Thomas Huang | This work provides a simple approach to discover tight object bounding boxes with only image-level supervision, called Tight box mining with Surrounding Segmentation Context (TS2C). |
223 | Hierarchy of Alternating Specialists for Scene Recognition | Hyo Jin Kim, Jan-Michael Frahm | We introduce a method for improving convolutional neural networks (CNNs) for scene classification. |
224 | Revisiting RCNN: On Awakening the Classification Power of Faster RCNN | Bowen Cheng, Yunchao Wei, Honghui Shi, Rogerio Feris, Jinjun Xiong, Thomas Huang | In this paper, we analyze failure cases of state-of-the-art detectors and observe that most hard false positives result from classification instead of localization. |
225 | A Hybrid Model for Identity Obfuscation by Face Replacement | Qianru Sun, Ayush Tewari, Weipeng Xu, Mario Fritz, Christian Theobalt, Bernt Schiele | We propose a new hybrid approach to obfuscate identities in photos by head replacement. |
226 | 3D Scene Flow from 4D Light Field Gradients | Sizhuo Ma, Brandon M. Smith, Mohit Gupta | This paper presents novel techniques for recovering 3D dense scene flow, based on differential analysis of 4D light fields. |
227 | RIDI: Robust IMU Double Integration | Hang Yan, Qi Shan, Yasutaka Furukawa | This paper proposes a novel data-driven approach for inertial navigation, which learns to estimate trajectories of natural human motions just from an inertial measurement unit (IMU) in every smartphone. |
228 | Superpixel Sampling Networks | Varun Jampani, Deqing Sun, Ming-Yu Liu, Ming-Hsuan Yang, Jan Kautz | We develop a new differentiable model for superpixel sampling that leverages deep networks for learning superpixel segmentation. |
229 | Towards Robust Neural Networks via Random Self-ensemble | Xuanqing Liu, Minhao Cheng, Huan Zhang, Cho-Jui Hsieh | In this paper, we propose a new defense algorithm called Random Self-Ensemble (RSE) by combining two important concepts: randomness and ensemble. |
230 | The Sound of Pixels | Hang Zhao, Chuang Gan, Andrew Rouditchenko, Carl Vondrick, Josh McDermott, Antonio Torralba | We introduce PixelPlayer, a system that, by leveraging large amounts of unlabeled videos, learns to locate image regions which produce sounds and separate the input sounds into a set of components that represents the sound from each pixel. |
231 | Adaptive Affinity Fields for Semantic Segmentation | Tsung-Wei Ke, Jyh-Jing Hwang, Ziwei Liu , Stella X. Yu | Instead of learning to enforce semantic labels on individual pixels, we propose to enforce affinity field patterns in individual pixel neighbourhoods, i.e., the semantic label patterns of whether neighbour pixels are in the same segment should match between the prediction and the ground-truth. |
232 | Joint Map and Symmetry Synchronization | Yifan Sun, Zhenxiao Liang, Xiangru Huang, Qixing Huang | In this paper, we study the problem of jointly optimizing self-symmetries and pair-wise maps among a collection of similar objects. |
233 | EC-Net: an Edge-aware Point set Consolidation Network | Lequan Yu, Xianzhi Li, Chi-Wing Fu, Daniel Cohen-Or, Pheng-Ann Heng | In this paper, we present the first deep learning based {em edge-aware} technique to facilitate the consolidation of point clouds. |
234 | ReenactGAN: Learning to Reenact Faces via Boundary Transfer | Wayne Wu, Yunxuan Zhang, Cheng Li, Chen Qian, Chen Change Loy | We present a novel learning-based framework for face reenactment. |
235 | Semi-Supervised Generative Adversarial Hashing for Image Retrieval | Guan’an Wang, Qinghao Hu, Jian Cheng, Zengguang Hou | In this paper, inspired by the idea of generative models and the minimax two-player game, we propose a novel semi-supervised generative adversarial hashing (SSGAH) approach. |
236 | Training Binary Weight Networks via Semi-Binary Decomposition | Qinghao Hu, Gang Li, Peisong Wang, Yifan Zhang, Jian Cheng | In this paper, we propose a novel semi-binary decomposition method which decomposes a matrix into two binary matrices and a diagonal matrix. |
237 | Part-Activated Deep Reinforcement Learning for Action Prediction | Lei Chen, Jiwen Lu, Zhanjie Song, Jie Zhou | In this paper, we propose a part-activated deep reinforcement learning (PA-DRL) for action prediction. |
238 | Learning to Anonymize Faces for Privacy Preserving Action Detection | Zhongzheng Ren, Yong Jae Lee, Michael S. Ryoo | In this paper, we propose a new principled approach for learning a video anonymizer. |
239 | Lifelong Learning via Progressive Distillation and Retrospection | Saihui Hou, Xinyu Pan, Chen Change Loy, Zilei Wang, Dahua Lin | In this work, we propose a novel approach to lifelong learning, which tries to seek a better balance between preservation and adaptation via two techniques: Distillation and Retrospection. |
240 | Focus, Segment and Erase: An Efficient Network for Multi-Label Brain Tumor Segmentation | Xuan Chen, Jun Hao Liew, Wei Xiong, Chee-Kong Chui, Sim-Heng Ong | In this paper, we propose a novel end-to-end trainable network named FSENet to address the aforementioned issues. |
241 | Consensus-Driven Propagation in Massive Unlabeled Data for Face Recognition | Xiaohang Zhan , Ziwei Liu, Junjie Yan , Dahua Lin , Chen Change Loy | In this work, we show that unlabeled face data can be as effective as the labeled ones. |
242 | A Closed-form Solution to Photorealistic Image Stylization | Yijun Li, Ming-Yu Liu, Xueting Li, Ming-Hsuan Yang, Jan Kautz | In this paper, we propose a method to address these issues. |
243 | MT-VAE: Learning Motion Transformations to Generate Multimodal Human Dynamics | Xinchen Yan, Akash Rastogi, Ruben Villegas, Kalyan Sunkavalli, Eli Shechtman, Sunil Hadap, Ersin Yumer, Honglak Lee | We leverage this structure and present a novel Motion Transformation Variational Auto-Encoders (MT-VAE) for learning motion sequence generation. |
244 | 3D Recurrent Neural Networks with Context Fusion for Point Cloud Semantic Segmentation | Xiaoqing Ye, Jiamao Li, Hexiao Huang, Liang Du, Xiaolin Zhang | In this paper, a novel end-to-end approach for unstructured point cloud semantic segmentation is proposed to exploit the inherent contextual features. |
245 | Rethinking the Form of Latent States in Image Captioning | Bo Dai, Deming Ye, Dahua Lin | Existing captioning models usually represent latent states as vectors, taking this practice for granted. |
246 | Move Forward and Tell: A Progressive Generator of Video Descriptions | Yilei Xiong, Bo Dai, Dahua Lin | We present an efficient framework that can generate a coherent paragraph to describe a given video. |
247 | Joint Person Segmentation and Identification in Synchronized First- and Third-person Videos | Mingze Xu, Chenyou Fan, Yuchen Wang, Michael S. Ryoo, David J. Crandall | In this paper, we wish to solve two specific problems: (1) given two or more synchronized third-person videos of a scene, produce a pixel-level segmentation of each visible person and identify corresponding people across different views (i.e., determine who in camera A corresponds with whom in camera B), and (2) given one or more synchronized third-person videos as well as a first-person video taken by a mobile or wearable camera, segment and identify the camera wearer in the third-person videos. |
248 | Transductive Semi-Supervised Deep Learning using Min-Max Features | Weiwei Shi, Yihong Gong, Chris Ding, Zhiheng MaXiaoyu Tao, Nanning Zheng | In this paper, we propose Transductive Semi-Supervised Deep Learning (TSSDL) method that is effective for training Deep Convolutional Neural Network (DCNN) models. |
249 | SAN: Learning Relationship between Convolutional Features for Multi-Scale Object Detection | Yonghyun Kim, Bong-Nam Kang, Daijin Kim | We evaluate our method on VOC PASCAL and MS COCO dataset. |
250 | Visual Tracking via Spatially Aligned Correlation Filters Network | Mengdan Zhang, Qiang Wang, Junliang Xing, Jin Gao, Peixi Peng, Weiming Hu, Steve Maybank | Visual Tracking via Spatially Aligned Correlation Filters Network |
251 | Predicting Future Instance Segmentation by Forecasting Convolutional Features | Pauline Luc, Camille Couprie, Yann LeCun, Jakob Verbeek | In this paper we consider the more challenging problem of future instance segmentation, which additionally segments out individual objects. |
252 | MVSNet: Depth Inference for Unstructured Multi-view Stereo | Yao Yao, Zixin Luo, Shiwei Li, Tian Fang, Long Quan | We present an end-to-end deep learning architecture for depth map inference from multi-view images. |
253 | Learning Monocular Depth by Distilling Cross-domain Stereo Networks | Xiaoyang Guo, Hongsheng Li, Shuai Yi, Jimmy Ren, Xiaogang Wang | In this paper, we propose to use the stereo matching network as a proxy to learn depth from synthetic data and use predicted stereo disparity maps for supervising the monocular depth estimation network. |
254 | Person Re-identification with Deep Similarity-Guided Graph Neural Network | Yantao Shen, Hongsheng Li, Shuai Yi, Dapeng Chen, Xiaogang Wang | In this paper, we propose a novel deep learning framework, named Similarity-Guided Graph Neural Network (SGGNN) to overcome such limitations. |
255 | Learning and Matching Multi-View Descriptors for Registration of Point Clouds | Lei Zhou, Siyu Zhu, Zixin Luo, Tianwei Shen, Runze Zhang, Mingmin Zhen, Tian Fang, Long Quan | We have demonstrated the boost of our approaches to registration on the public scanning and multi-view stereo datasets. |
256 | Flow-Grounded Spatial-Temporal Video Prediction from Still Images | Yijun Li, Chen Fang, Jimei Yang, Zhaowen Wang, Xin Lu, Ming-Hsuan Yang | In this work, we study the problem of generating consecutive multiple future frames by observing one single still image only. |
257 | The Contextual Loss for Image Transformation with Non-Aligned Data | Roey Mechrez, Itamar Talmi, Lihi Zelnik-Manor | We present an alternative loss function that does not require alignment, thus providing an effective and simple solution for a new space of problems. |
258 | Online Dictionary Learning for Approximate Archetypal Analysis | Jieru Mei, Chunyu Wang, Wenjun Zeng | We propose a variant of archetypal analysis which scales gracefully to large datasets. |
259 | Video Object Segmentation by Learning Location-Sensitive Embeddings | Hai Ci, Chunyu Wang, Yizhou Wang | To deal with appearance changes, for a test video, we propose a robust model adaptation method which pre-scans the whole video, generates pseudo foreground/background labels and retrains the model based on the labels. |
260 | Hashing with Binary Matrix Pursuit | Fatih Cakir, Kun He, Stan Sclaroff | We propose theoretical and empirical improvements for two-stage hashing methods. |
261 | Learning to Capture Light Fields through a Coded Aperture Camera | Yasutaka Inagaki, Yuto Kobayashi, Keita Takahashi, Toshiaki Fujii, Hajime Nagahara | We propose a learning-based framework for acquiring a light field through a coded aperture camera. |
262 | Learning to Reconstruct High-quality 3D Shapes with Cascaded Fully Convolutional Networks | Yan-Pei Cao, Zheng-Ning Liu, Zheng-Fei Kuang, Leif Kobbelt, Shi-Min Hu | We present a data-driven approach to reconstructing high-resolution and detailed volumetric representations of 3D shapes. |
263 | X2Face: A network for controlling face generation using images, audio, and pose codes | Olivia Wiles, A. Sophia Koepke, Andrew Zisserman | The objective of this paper is a neural network model that controls the pose and expression of a given face, using another face or modality (e.g. audio). |
264 | End-to-End Learning of Driving Models with Surround-View Cameras and Route Planners | Simon Hecker, Dengxin Dai, Luc Van Gool | We investigate the problem in a more realistic setting, which consists of a surround-view camera system with eight cameras, a route planner, and a CAN bus reader. With such a sensor setup we collect a new driving dataset, covering diverse driving scenarios and varying weather/illumination conditions. |
265 | Model Adaptation with Synthetic and Real Data for Semantic Dense Foggy Scene Understanding | Christos Sakaridis, Dengxin Dai, Simon Hecker, Luc Van Gool | In this paper, we propose a novel method, named Curriculum Model Adaptation (CMAda), which gradually adapts a semantic segmentation model from light synthetic fog to dense real fog in multiple steps, using both synthetic and real foggy data. |
266 | DPP-Net: Device-aware Progressive Search for Pareto-optimal Neural Architectures | Jin-Dong Dong, An-Chieh Cheng, Da-Cheng Juan, Wei Wei, Min Sun | We propose DPP-Net: Device-aware Progressive Search for Pareto-optimal Neural Architectures, optimizing for both device-related (e.g., inference time and memory usage) and device-agnostic (e.g., accuracy and model size) objectives. |
267 | Revisiting Autofocus for Smartphone Cameras | Abdullah Abuolaim, Abhijith Punnappurath, Michael S. Brown | The work in this paper aims to revisit AF for smartphones within the context of temporal image data. |
268 | Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence | Arslan Chaudhry, Puneet K. Dokania, Thalaiyasingam Ajanthan, Philip H. S. Torr | One of the main objectives of this work is to fill these gaps so as to provide a common ground for better understanding of IL. |
269 | A Dataset of Flash and Ambient Illumination Pairs from the Crowd | Yagiz Aksoy, Changil Kim, Petr Kellnhofer, Sylvain Paris, Mohamed Elgharib, Marc Pollefeys, Wojciech Matusik | We present a dataset of thousands of ambient and flash illumination pairs to enable studying flash photography and other applications that can benefit from having separate illuminations. Different than the typical use of crowdsourcing in generating computer vision datasets, we make use of the crowd to directly take the photographs that make up our dataset. |
270 | Deep Burst Denoising | Clement Godard, Kevin Matzen, Matt Uyttendaele | In this paper, we use the burst-capture strategy and implement the intelligent integration via a recurrent fully convolutional deep neural net (CNN). |
271 | MaskConnect: Connectivity Learning by Gradient Descent | Karim Ahmed, Lorenzo Torresani | In this work we remove these predefined choices and propose an algorithm to learn the connections between modules in the network. |
272 | ISNN: Impact Sound Neural Network for Audio-Visual Object Classification | Auston Sterling, Justin Wilson, Sam Lowe, Ming C. Lin | We evaluate our method on multiple datasets of both recorded and synthesized sounds. |
273 | Dependency-aware Attention Control for Unconstrained Face Recognition with Image Sets | Xiaofeng Liu, B.V.K Vijaya Kumar, Chao Yang, Qingming Tang, Jane You | Specifically, we propose a dependency-aware attention control (DAC) network, which resorts to actor-critic reinforcement learning for sequential attention decision of each image embedding to fully exploit the rich correlation cues among the unordered images. |
274 | StereoNet: Guided Hierarchical Refinement for Real-Time Edge-Aware Depth Prediction | Sameh Khamis, Sean Fanello, Christoph Rhemann, Adarsh Kowdle, Julien Valentin, Shahram Izadi | A key insight of this paper is that the network achieves a sub-pixel matching precision than is a magnitude higher than those of traditional stereo matching approaches. |
275 | Compositing-aware Image Search | Hengshuang Zhao, Xiaohui Shen, Zhe Lin, Kalyan Sunkavalli, Brian Price, Jiaya Jia | We present a new image search technique that, given a background image, returns compatible foreground objects for image compositing tasks. We collect an evaluation set consisting of eight object categories commonly used in compositing tasks, on which we demonstrate that our approach significantly outperforms other search techniques. |
276 | Online Multi-Object Tracking with Dual Matching Attention Networks | Ji Zhu, Hua Yang, Nian Liu, Minyoung Kim, Wenjun Zhang, Ming-Hsuan Yang | In this paper, we propose an online Multi-Object Tracking (MOT) approach which integrates the merits of single object tracking and data association methods in a unified framework to handle noisy detections and frequent interactions between targets. |
277 | Improving Sequential Determinantal Point Processes for Supervised Video Summarization | Aidean Sharghi , Ali Borji, Chengtao Li , Tianbao Yang , Boqing Gong | In terms of learning, we propose a large-margin algorithm to address the exposure bias problem in SeqDPP. |
278 | Online Detection of Action Start in Untrimmed, Streaming Videos | Zheng Shou, Junting Pan, Jonathan Chan, Kazuyuki Miyazawa, Hassan Mansour, Anthony Vetro, Xavier Giro-i-Nieto, Shih-Fu Chang | We propose three novel methods to specifically address the challenges in training ODAS models: (1) hard negative samples generation based on Generative Adversarial Network (GAN) to distinguish ambiguous background, (2) explicitly modeling the temporal consistency between data around action start and data succeeding action start, and (3) adaptive sampling strategy to handle the scarcity of training data. |
279 | Volumetric performance capture from minimal camera viewpoints | Andrew Gilbert, Marco Volino, John Collomosse, Adrian Hilton | We present a convolutional autoencoder that enables high fidelity volumetric reconstructions of human performance to be captured from multi-view video comprising only a small set of camera views. |
280 | Coreset-Based Neural Network Compression | Abhimanyu Dubey, Moitreya Chatterjee, Narendra Ahuja | We propose a novel Convolutional Neural Network (CNN) compression algorithm based on coreset representations of filters. |
281 | A Framework for Evaluating 6-DOF Object Trackers | Mathieu Garon, Denis Laurendeau, Jean-Francois Lalonde | We present a challenging and realistic novel dataset for evaluating 6-DOF object tracking algorithms. Using a data acquisition pipeline based on a commercial motion capture system for acquiring accurate ground truth poses of real objects with respect to a Kinect V2 camera, we build a dataset which contains a total of 297 calibrated sequences. |
282 | Learning to Separate Object Sounds by Watching Unlabeled Video | Ruohan Gao, Rogerio Feris, Kristen Grauman | We propose to learn audio-visual object models from unlabeled video, then exploit the visual context to perform audio source separation in novel videos. |
283 | Connecting Gaze, Scene, and Attention: Generalized Attention Estimation via Joint Modeling of Gaze and Scene Saliency | Eunji Chong, Nataniel Ruiz, Yongxin Wang, Yun Zhang, Agata Rozga, James M. Rehg | This paper addresses the challenging problem of estimating the general visual attention of people in images. |
284 | Neural Graph Matching Networks for Fewshot 3D Action Recognition | Michelle Guo, Edward Chou, De-An Huang, Shuran Song, Serena Yeung, Li Fei-Fei | We propose Neural Graph Matching (NGM) Networks, a novel framework that can learn to recognize a previous unseen 3D action class with only a few examples. |
285 | Temporal Modular Networks for Retrieving Complex Compositional Activities in Videos | Bingbin Liu, Serena Yeung, Edward Chou, De-An Huang, Li Fei-Fei, Juan Carlos Niebles | In this work, we introduce an approach for explicitly and dynamically reasoning about compositional natural language descriptions of activity in videos. |
286 | Attention-aware Deep Adversarial Hashing for Cross-Modal Retrieval | Xi Zhang, Hanjiang Lai , Jiashi Feng | To further address this problem, we propose an adversarial hashing network with an attention mechanism to enhance the measurement of content similarities by selectively focusing on the informative parts of multi-modal data. |
287 | 3DFeat-Net: Weakly Supervised Local 3D Features for Point Cloud Registration | Zi Jian Yew, Gim Hee Lee | In this paper, we propose the 3DFeat-Net which learns both 3D feature detector and descriptor for point cloud matching using weak supervision. We create training and benchmark outdoor Lidar datasets, and our experiments on these datasets show that our 3DFeat-Net outperforms existing handcrafted and learned 3D features. |
288 | Meta-Tracker: Fast and Robust Online Adaptation for Visual Object Trackers | Eunbyung Park, Alexander C. Berg | Our core contribution is an offline meta-learning-based method to adjust the initial deep networks used in online adaptation-based tracking. |
289 | Variable Ring Light Imaging: Capturing Transient Subsurface Scattering with An Ordinary Camera | Ko Nishino, Art Subpa-asa, Yuta Asano, Mihoko Shimano, Imari Sato | We introduce a novel imaging method that enables the decomposition of the appearance of a fronto-parallel real-world surface into images of light with bounded path lengths, i.e., transient subsurface light transport. |
290 | Graph R-CNN for Scene Graph Generation | Jianwei Yang, Jiasen Lu, Stefan Lee, Dhruv Batra, Devi Parikh | We propose a novel scene graph generation model called Graph R-CNN, that is both effective and efficient at detecting objects and their relations in images. |
291 | Deep Domain Generalization via Conditional Invariant Adversarial Networks | Ya Li, Xinmei Tian, Mingming Gong, Yajing Liu, Tongliang Liu, Kun Zhang, Dacheng Tao | To address the above two drawbacks, we propose an end-to-end conditional invariant deep domain generalization approach by leveraging deep neural networks for domain-invariant representation learning. |
292 | Using LIP to Gloss Over Faces in Single-Stage Face Detection Networks | Siqi Yang, Arnold Wiliem, Shaokang Chen, Brian C. Lovell | In this paper, we call this problem the Instance Perturbation Interference (IPI) problem. |
293 | Pose-Normalized Image Generation for Person Re-identification | Xuelin Qian, Yanwei Fu, Tao Xiang, Wenxuan Wang, Jie Qiu, Yang Wu, Yu-Gang Jiang, Xiangyang Xue | In this work, we address both problems by proposing a novel deep person image generation model for synthesizing realistic person images conditional on the pose. |
294 | Videos as Space-Time Region Graphs | Xiaolong Wang, Abhinav Gupta | In this paper, we propose to represent videos as space-time region graphs which capture these two important cues. |
295 | Learning 3D Human Pose from Structure and Motion | Rishabh Dabral, Anurag Mundhada, Uday Kusupati, Safeer Afaque, Abhishek Sharma, Arjun Jain | We propose two anatomically inspired loss functions and use them with a weakly-supervised learning framework to jointly learn from large-scale in-the-wild 2D and indoor/synthetic 3D data. |
296 | Deep Adaptive Attention for Joint Facial Action Unit Detection and Face Alignment | Zhiwen Shao, Zhilei Liu, Jianfei Cai, Lizhuang Ma | In this paper, we propose a novel end-to-end deep learning framework for joint AU detection and face alignment, which has not been explored before. |
297 | HiDDeN: Hiding Data with Deep Networks | Jiren Zhu, Russell Kaplan, Justin Johnson, Li Fei-Fei | Recent work has shown that deep neural networks are highly sensitive to tiny perturbations of input images, giving rise to adversarial examples. |
298 | Deep Cross-Modal Projection Learning for Image-Text Matching | Ying Zhang, Huchuan Lu | In this paper, we propose a cross-modal projection matching (CMPM) loss and a cross-modal projection classification (CMPC) loss for learning discriminative image-text embeddings. |
299 | Large Scale Urban Scene Modeling from MVS Meshes | Lingjie Zhu, Shuhan Shen, Xiang Gao, Zhanyi Hu | In this paper we present an effcient modeling framework for large scale urban scenes. |
300 | Dual-Agent Deep Reinforcement Learning for Deformable Face Tracking | Minghao Guo, Jiwen Lu, Jie Zhou | In this paper, we propose a dual-agent deep reinforcement learning (DADRL) method for deformable face tracking, which generates bounding boxes and detects facial landmarks interactively from face videos. |
301 | Unified Perceptual Parsing for Scene Understanding | Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, Jian Sun | In this paper, we study a new task called Unified Perceptual Parsing, which requires the machine vision systems to recognize as many visual concepts as possible from a given image. |
302 | Multimodal Dual Attention Memory for Video Story Question Answering | Kyung-Min Kim, Seong-Ho Choi, Jin-Hwa Kim, Byoung-Tak Zhang | We propose a video story question-answering (QA) architecture, Multimodal Dual Attention Memory (MDAM). |
303 | Deep Reinforcement Learning with Iterative Shift for Visual Tracking | Liangliang Ren, Xin Yuan, Jiwen Lu, Ming Yang, Jie Zhou | In this paper, we propose a deep reinforcement learning with iterative shift (DRL-IS) method for single object tracking, where an actor-critic network is introduced to predict the iterative shifts of object bounding boxes, and evaluate the shifts to take actions on whether to update object models or re-initialize tracking. |
304 | Collaborative Deep Reinforcement Learning for Multi-Object Tracking | Liangliang Ren, Jiwen Lu, Zifeng Wang, Qi Tian, Jie Zhou | In this paper, we propose a collaborative deep reinforcement learning (C-DRL) method for multi-object tracking. |
305 | Deep Variational Metric Learning | Xudong Lin, Yueqi Duan, Qiyuan Dong, Jiwen Lu, Jie Zhou | In this paper, we propose a deep variational metric learning (DVML) framework to explicitly model the intra-class variance and disentangle the intra-class invariance, namely, the class centers. |
306 | A Joint Sequence Fusion Model for Video Question Answering and Retrieval | Youngjae Yu , Jongseok Kim , Gunhee Kim | We present an approach named JSFusion (Joint Sequence Fusion) that can measure semantic similarity between any pairs of multimodal sequence data (e.g. a video sequence and a language sentence). |
307 | Deep Pictorial Gaze Estimation | Seonwook Park, Adrian Spurr, Otmar Hilliges | In this paper, we introduce a novel deep neural network architecture specifically designed for the task of gaze estimation from single eye input. |
308 | PSDF Fusion: Probabilistic Signed Distance Function for On-the-fly 3D Data Fusion and Scene Reconstruction | Wei Dong, Qiuyuan Wang, Xin Wang, Hongbin Zha | We propose a novel 3D spatial representation for data fusion and scene reconstruction. |
309 | Multi-Scale Context Intertwining for Semantic Segmentation | Di Lin, Yuanfeng Ji, Dani Lischinski, Daniel Cohen-Or, Hui Huang | In this work, we propose a novel scheme for aggregating features from different scales, which we refer to as Multi-Scale Context Intertwining (MSCI). |
310 | Learning to Fuse Proposals from Multiple Scanline Optimizations in Semi-Global Matching | Johannes L. Schonberger, Sudipta N. Sinha, Marc Pollefeys | We propose replacing this aggregation scheme with a new learning-based method that fuses disparity proposals estimated using scanline optimization. |
311 | Saliency Detection in 360° Videos | Ziheng Zhang, Yanyu Xu, Jingyi Yu, Shenghua Gao | This paper presents a novel spherical convolutional neural network based scheme for saliency detection for 360° videos. To validate our approach, we construct a large-scale 360° videos saliency detection benchmark that consists of 104 360° videos viewed by 20+ human subjects. |
312 | Scaling Egocentric Vision: The EPIC-KITCHENS Dataset | Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Sanja Fidler, Antonino Furnari, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, Michael Wray | In this paper, we introduce EPIC-KITCHENS, a large-scale egocentric video benchmark recorded by 32 participants in their native kitchen environments. |
313 | AugGAN: Cross Domain Adaptation with GAN-based Data Augmentation | Sheng-Wei Huang, Che-Tsung Lin, Shu-Ping Chen, Yen-Yi Wu, Po-Hao Hsu, Shang-Hong Lai | Despite recent GAN (Generative Adversarial Network) based methods have shown compelling visual results, they are prone to fail at preserving image-objects and maintaining translation consistency when faced with large and complex domain shifts, which reduces their practicality on tasks such as generating large-scale training data for different domains. |
314 | Incremental Non-Rigid Structure-from-Motion with Unknown Focal Length | Thomas Probst, Danda Pani Paudel, Ajad Chhatkuli, Luc Van Gool | On its basis we propose a method to simultaneously recover the focal length and the non-rigid shapes. |
315 | Dynamic Multimodal Instance Segmentation Guided by Natural Language Queries | Edgar Margffoy-Tuay, Juan C. Perez, Emilio Botero, Pablo Arbelaez | We propose a novel method that integrates these two insights in order to fully exploit the recursive nature of language. |
316 | Graininess-Aware Deep Feature Learning for Pedestrian Detection | Chunze Lin, Jiwen Lu, Gang Wang, Jie Zhou | In this paper, we propose a graininess-aware deep feature learning method for pedestrian detection. |
317 | Acquisition of Localization Confidence for Accurate Object Detection | Borui Jiang, Ruixuan Luo, Jiayuan Mao, Tete Xiao, Yuning Jiang | In the paper we propose IoU-Net learning to predict the IoU between each detected bounding box and the matched ground-truth. |
318 | Learning Shape Priors for Single-View 3D Completion and Reconstruction | Jiajun Wu, Chengkai Zhang, Xiuming Zhang, Zhoutong Zhang, William T. Freeman, Joshua B. Tenenbaum | In this paper, we propose ShapeHD, pushing the limit of single-view shape completion and reconstruction by integrating deep generative models with adversarially learned shape priors. |
319 | R2P2: A ReparameteRized Pushforward Policy for Diverse, Precise Generative Path Forecasting | Nicholas Rhinehart, Kris M. Kitani, Paul Vernaza | We propose a method to forecast a vehicle’s ego-motion as a distribution over spatiotemporal paths, conditioned on features (e.g., from LIDAR and images) embedded in an overhead map. |
320 | Synthetically Supervised Feature Learning for Scene Text Recognition | Yang Liu, Zhaowen Wang, Hailin Jin, Ian Wassell | We propose to leverage the parameters that lead to the output images to improve image feature learning. |
321 | Localization Recall Precision (LRP): A New Performance Metric for Object Detection | Kemal Oksuz, Baris Can Cam, Emre Akbas, Sinan Kalkan | In this paper, we propose “Localization Recall Precision (LRP) Error”, a new metric specifically designed for object detection. |
322 | Second-order Democratic Aggregation | Tsung-Yu Lin, Subhransu Maji, Piotr Koniusz | In this paper we study a class of orderless aggregation functions designed to minimize emph{interference} or equalize emph{contributions} in the context of second-order features and show that they can be computed just as efficiently as their first-order counterparts and have favorable properties over aggregation by summation. |
323 | Lip Movements Generation at a Glance | Lele Chen, Zhiheng Li, Ross K Maddox, Zhiyao Duan, Chenliang Xu | In this paper, we consider a task of such: given an arbitrary audio speech and one lip image of arbitrary target identity, generate synthesized lip movements of the target identity saying the speech. |
324 | Probabilistic Video Generation using Holistic Attribute Control | Jiawei He, Andreas Lehrmann, Joseph Marino, Greg Mori, Leonid Sigal | Based on this intuition, we propose a generative framework for video generation and future prediction. |
325 | AGIL: Learning Attention from Human for Visuomotor Tasks | Ruohan Zhang, Zhuode Liu, Luxin Zhang, Jake A. Whritner, Karl S. Muller, Mary M. Hayhoe, Dana H. Ballard | With this motivation, we propose the AGIL (Attention Guided Imitation Learning) framework. |
326 | Occlusion-aware R-CNN: Detecting Pedestrians in a Crowd | Shifeng Zhang, Longyin Wen, Xiao Bian, Zhen Lei, Stan Z. Li | In this paper, we propose a new occlusion-aware R-CNN (OR-CNN) to improve the detection accuracy in the crowd. |
327 | Eliminating the Blind Spot: Adapting 3D Object Detection and Monocular Depth Estimation to 360° Panoramic Imagery | Greire Payen de La Garanderie, Amir Atapour Abarghouei, Toby P. Breckon | We present an approach to adapt contemporary deep network architectures developed on conventional rectilinear imagery to work on equirectangular 360° panoramic imagery. |
328 | Seeing Tree Structure from Vibration | Tianfan Xue, Jiajun Wu, Zhoutong Zhang, Chengkai Zhang, Joshua B. Tenenbaum, William T. Freeman | We propose to tackle this problem through spectrum analysis of motion signals, because vibrations of disconnected branches, though visually similar, often have distinctive natural frequencies. |
329 | Learning Rigidity in Dynamic Scenes with a Moving Camera for 3D Motion Field Estimation | Zhaoyang Lv, Kihwan Kim, Alejandro Troccoli, Deqing Sun, James M. Rehg, Jan Kautz | Compared to other state-of-the-art 3D scene flow estimation methods, in this paper we propose to emph{learn} the rigidity of a scene in a supervised manner from a large collection of dynamic scene data, and directly infer a rigidity mask from two sequential images with depths. For training and testing the rigidity network, we also provide a new semi-synthetic dynamic scene dataset (synthetic foreground objects with a real background) and an evaluation split that accounts for the percentage of observed non-rigid pixels. |
330 | HGMR: Hierarchical Gaussian Mixtures for Adaptive 3D Registration | B. Eckart, K. Kim, J. Kautz | In this paper, we present a new registration algorithm that is able to achieve state-of-the-art speed and accuracy through its use of a Hierarchical Gaussian Mixture representation. |
331 | Deep Imbalanced Attribute Classification using Visual Attention Aggregation | Nikolaos Sarafianos, Xiang Xu, Ioannis A. Kakadiaris | With that in mind, we propose an effective method that extracts and aggregates visual attention masks at different scales. |
332 | Cross-Modal Ranking with Soft Consistency and Noisy Labels for Robust RGB-T Tracking | Chenglong Li, Chengli Zhu, Yan Huang, Jin Tang, Liang Wang | To address this problem, this paper presents a novel approach to suppress background effects for RGB-T tracking. |
333 | Shift-Net: Image Inpainting via Deep Feature Rearrangement | Zhaoyi Yan, Xiaoming Li, Mu Li, Wangmeng Zuo, Shiguang Shan | In this paper, we introduce a special shift-connection layer to the U-Net architecture, namely Shift-Net, for filling in missing regions of any shape with sharp structures and fine-detailed textures. |
334 | Small-scale Pedestrian Detection Based on Topological Line Localization and Temporal Feature Aggregation | Tao Song, Leiyu Sun, Di Xie, Haiming Sun, Shiliang Pu | Motivated by this, we propose a novel method integrated with somatic topological line localization (TLL) and temporal feature aggregation for detecting multi-scale pedestrians, which works particularly well with small-scale pedestrians that are relatively far from the camera. |
335 | Sub-GAN: An Unsupervised Generative Model via Subspaces | Jie Liang, Jufeng Yang, Hsin-Ying Lee, Kai Wang, Ming-Hsuan Yang | In this paper, we present a subspace-based generative adversarial network (Sub-GAN) which simultaneously disentangles multiple latent subspaces and generates diverse samples correspondingly. |
336 | VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions | Qing Li, Qingyi Tao, Shafiq Joty, Jianfei Cai, Jiebo Luo | To this end, we propose a new task of VQA-E (VQA with Explanation), where the computational models are required to generate an explanation with the predicted answer. We first construct a new dataset, and then frame the VQA-E problem in a multi-task learning architecture. |
337 | Penalizing Top Performers: Conservative Loss for Semantic Segmentation Adaptation | Xinge Zhu, Hui Zhou, Ceyuan Yang, Jianping Shi, Dahua Lin | To this end, we propose a novel loss function, i.e., Conservative Loss, which penalizes the extreme good and bad cases while encouraging the moderate examples. |
338 | Interactive Boundary Prediction for Object Selection | Hoang Le, Long Mai, Brian Price, Scott Cohen, Hailin Jin, Feng Liu | In this paper, we introduce an interaction-aware method for boundary-based image segmentation. |
339 | Pyramid Dilated Deeper ConvLSTM for Video Salient Object Detection | Hongmei Song, Wenguan Wang, Sanyuan Zhao, Jianbing Shen, Kin-Man Lam | This paper proposes a fast video salient object detection model, based on a novel recurrent network architecture, named Pyramid Dilated Bidirectional ConvLSTM (PDB-ConvLSTM). |
340 | CIRL: Controllable Imitative Reinforcement Learning for Vision-based Self-driving | Xiaodan Liang, Tairui Wang, Luona Yang, Eric Xing | We present a general and principled Controllable Imitative Reinforcement Learning (CIRL) approach which successfully makes the driving agent achieve higher success rates based on only vision inputs in a high-fidelity car simulator. |
341 | The Devil of Face Recognition is in the Noise | Fei Wang, Liren Chen, Cheng Li, Shiyao Huang, Yanjie Chen, Chen Qian, Chen Change Loy | 4) We investigate ways to improve data cleanliness, including a comprehensive user study on the influence of data labeling strategies to annotation accuracy. We make the following contributions: 1) We contribute cleaned subsets of popular face databases, i.e., MegaFace and MS-Celeb-1M datasets, and build a new large-scale noise-controlled IMDb-Face dataset. |
342 | Where Will They Go? Predicting Fine-Grained Adversarial Multi-Agent Motion using Conditional Variational Autoencoders | Panna Felsen, Patrick Lucey, Sujoy Ganguly | In this paper, we present a technique using conditional variational autoencoder which learns a model that “personalizes” prediction to individual agent behavior within a group representation. |
343 | Bi-Real Net: Enhancing the Performance of 1-bit CNNs with Improved Representational Capability and Advanced Training Algorithm | Zechun Liu, Baoyuan Wu, Wenhan Luo, Xin Yang, Wei Liu, Kwang-Ting Cheng | In this work, we study the 1-bit convolutional neural networks (CNNs), of which both the weights and activations are binary. |
344 | X-ray Computed Tomography Through Scatter | Adam Geva, Yoav Y. Schechner, Yonatan Chernyak, Rajiv Gupta | Treating these scattered photons as a source of information, we solve an inverse problem based on a 3D radiative transfer model that includes both elastic (Rayleigh) and inelastic (Compton) scattering. |
345 | Shape Reconstruction Using Volume Sweeping and Learned Photoconsistency | Vincent Leroy, Jean-Sebastien Franco, Edmond Boyer | We consider in this paper the problem of 3D shape reconstruction from multi-view RGB images. |
346 | Unsupervised CNN-based Co-Saliency Detection with Graphical Optimization | Kuang-Jui Hsu, Chung-Chi Tsai, Yen-Yu Lin, Xiaoning Qian, Yung-Yu Chuang | In this paper, we address co-saliency detection in a set of images jointly covering objects of a specific class by an unsupervised convolutional neural network (CNN). |
347 | Unsupervised Person Re-identification by Deep Learning Tracklet Association | Minxian Li, Xiatian Zhu, Shaogang Gong | In this work, we address this problem by proposing an unsupervised re-id deep learning approach capable of incrementally discovering and exploiting the underlying re-id discriminative information from automatically generated person tracklet data from videos in an end-to-end deep model optimisation. |
348 | Seeing Deeply and Bidirectionally: A Deep Learning Approach for Single Image Reflection Removal | Jie Yang, Dong Gong, Lingqiao Liu, Qinfeng Shi | We propose a cascade deep neural network, which estimates both the background image and the reflection. |
349 | Learning Data Terms for Non-blind Deblurring | Jiangxin Dong, Jinshan Pan, Deqing Sun, Zhixun Su, Ming-Hsuan Yang | We propose a simple and effective discriminative framework to learn data terms that can adaptively handle blurred images in the presence of severe noise and outliers. |
350 | Mutual Learning to Adapt for Joint Human Parsing and Pose Estimation | Xuecheng Nie, Jiashi Feng, Shuicheng Yan | This paper presents a novel Mutual Learning to Adapt model (MuLA) for joint human parsing and pose estimation. |
351 | Statistically-motivated Second-order Pooling | Kaicheng Yu, Mathieu Salzmann | Here, by contrast, we introduce a statistically-motivated framework that projects the second-order descriptor into a compact vector while improving the representational power. |
352 | Video Re-localization | Yang Feng, Lin Ma, Wei Liu, Tong Zhang, Jiebo Luo | Subsequently, we propose an innovative cross gated bilinear matching model such that every time-step in the reference video is matched against the attentively weighted query video. |
353 | Orthogonal Deep Features Decomposition for Age-Invariant Face Recognition | Yitong Wang, Dihong Gong, Zheng Zhou, Xing Ji, Hao Wang, Zhifeng Li, Wei Liu, Tong Zhang | To reduce the intra-class discrepancy caused by aging, in this paper we propose a novel approach (namely, Orthogonal Embedding CNNs, or OE-CNNs) to learn the age-invariant deep face features. Besides, for complementing the existing cross-age datasets and advancing the research in this field, we construct a brand-new large-scale Cross-Age Face dataset (CAF). |
354 | Long-term Tracking in the Wild: a Benchmark | Jack Valmadre, Luca Bertinetto, Joao F. Henriques, Ran Tao, Andrea Vedaldi, Arnold W.M. Smeulders, Philip H.S. Torr, Efstratios Gavves | We introduce the OxUvA dataset and benchmark for evaluating single-object tracking algorithms. |
355 | Affinity Derivation and Graph Merge for Instance Segmentation | Yiding Liu, Siyu Yang, Bin Li, Wengang Zhou, Jizheng Xu, Houqiang Li, Yan Lu | We present an instance segmentation scheme based on pixel affinity information, which is the relationship of two pixels belonging to a same instance. |
356 | Deep Model-Based 6D Pose Refinement in RGB | Fabian Manhardt, Wadim Kehl, Nassir Navab, Federico Tombari | We present a novel approach for model-based 6D pose refinement in color data. |
357 | Zero-Shot Deep Domain Adaptation | Kuan-Chuan Peng, Ziyan Wu, Jan Ernst | To tackle this issue, we propose zero-shot deep domain adaptation (ZDDA), which uses privileged information from task-irrelevant dual-domain pairs. |
358 | Comparator Networks | Weidi Xie, Li Shen, Andrew Zisserman | The objective of this work is set-based verification, e.g. to decide if two sets of images of a face are of the same person or not. |
359 | Deep Regionlets for Object Detection | Hongyu Xu, Xutao Lv, Xiaoyu Wang, Zhou Ren, Navaneeth Bodla, Rama Chellappa | In this paper, we propose a novel object detection framework named “Deep Regionlets” by establishing a bridge between deep neural networks and conventional detection schema for accurate generic object detection. |
360 | DCAN: Dual Channel-wise Alignment Networks for Unsupervised Scene Adaptation | Zuxuan Wu, Xintong Han, Yen-Liang Lin, Mustafa Gokhan Uzunbas, Tom Goldstein, Ser Nam Lim, Larry S. Davis | We present Dual Channel-wise Alignment Networks (DCAN), a simple yet effective approach to reduce domain shift at both pixel-level and feature-level. |
361 | Generating 3D Faces using Convolutional Mesh Autoencoders | Anurag Ranjan, Timo Bolkart, Soubhik Sanyal, Michael J. Black | To address this, we introduce a versatile model that learns a non-linear representation of a face using spectral convolutions on a mesh surface. |
362 | ShapeStacks: Learning Vision-Based Physical Intuition for Generalised Object Stacking | Oliver Groth, Fabian B. Fuchs, Ingmar Posner, Andrea Vedaldi | In this paper we investigate the passive acquisition of an intuitive understanding of physical principles as well as the active utilisation of this intuition in the context of generalised object stacking. |
363 | Physical Primitive Decomposition | Zhijian Liu, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu | In this paper, we study physical primitive decomposition—understanding an object through its components, each with physical and geometric attributes. |
364 | Inner Space Preserving Generative Pose Machine | Shuangjun Liu, Sarah Ostadabbas | In this paper, we introduce an image “inner space” preserving model that assigns an interpretable low-dimensional pose descriptor (LDPD) to an articulated figure in the image. |
365 | Perturbation Robust Representations of Topological Persistence Diagrams | Anirudh Som, Kowshik Thopalli, Karthikeyan Natesan Ramamurthy, Vinay Venkataraman, Ankita Shukla, Pavan Turaga | In this paper we present theoretically well-grounded approaches to develop novel perturbation robust topological representations, with the long-term view of making them amenable to fusion with contemporary learning architectures. |
366 | Hierarchical Relational Networks for Group Activity Recognition and Retrieval | Mostafa S. Ibrahim, Greg Mori | We present a Hierarchical Relational Network that computes relational representations of people, given graph structures describing potential interactions. |
367 | Attention-based Ensemble for Deep Metric Learning | Wonsik Kim, Bhavya Goyal, Kunal Chawla, Jungmin Lee, Keunjoo Kwon | To this end, we propose an attention-based ensemble, which uses multiple attention masks, so that each learner can attend to different parts of the object. |
368 | Neural Procedural Reconstruction for Residential Buildings | Huayi Zeng, Jiaye Wu, Yasutaka Furukawa | This paper proposes a novel 3D reconstruction approach, dubbed Neural Procedural Reconstruction (NPR), which trains deep neural networks to procedurally apply shape grammar rules and reconstruct CAD-quality models from 3D points. |
369 | PyramidBox: A Context-assisted Single Shot Face Detector | Xu Tang, Daniel K. Du, Zeqiang He, Jingtuo Liu | This paper proposes a novel context-assisted single shot face detector, named emph{PyramidBox} to handle the hard face detection problem. |
370 | Predicting Gaze in Egocentric Video by Learning Task-dependent Attention Transition | Yifei Huang, Minjie Cai, Zhenqiang Li, Yoichi Sato | We present a new computational model for gaze prediction in egocentric videos by exploring patterns in temporal shift of gaze fixations (attention transition) that are dependent on egocentric manipulation tasks. |
371 | Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes | Pengyuan Lyu, Minghui Liao, Cong Yao, Wenhao Wu, Xiang Bai | In this paper, we investigate the problem of scene text spotting, which aims at simultaneous text detection and recognition in natural images. |
372 | Broadcasting Convolutional Network for Visual Relational Reasoning | Simyung Chang, John Yang, SeongUk Park, Nojun Kwak | In this paper, we propose the Broadcasting Convolutional Network (BCN) that extracts key object features from the global field of an entire input image and recognizes their relationship with local features. |
373 | Improving Spatiotemporal Self-Supervision by Deep Reinforcement Learning | Uta Buchler, Biagio Brattoli, Bjorn Ommer | Based on deep reinforcement learning we propose a sampling policy that adapts to the state of the network, which is being trained. |
374 | View-graph Selection Framework for SfM | Rajvi Shah, Visesh Chari, P J Narayanan | To model selection costs for this task, we introduce new disambiguation priors based on local geometry. |
375 | DFT-based Transformation Invariant Pooling Layer for Visual Classification | Jongbin Ryu, Ming-Hsuan Yang, Jongwoo Lim | We propose a novel discrete Fourier transform-based pooling layer for convolutional neural networks. |
376 | Learning Compression from Limited Unlabeled Data | Xiangyu He, Jian Cheng | In this paper, we reveal that re-normalization is the practical and effective way to alleviate the above limitations. |
377 | Bayesian Semantic Instance Segmentation in Open Set World | Trung Pham, Vijay Kumar B. G., Thanh-Toan Do, Gustavo Carneiro, Ian Reid | In this paper, we present a novel open-set semantic instance segmentation approach capable of segmenting all known and unknown object classes in images, based on the output of an object detector trained on known object classes. |
378 | BOP: Benchmark for 6D Object Pose Estimation | Tomas Hodan, Frank Michel, Eric Brachmann, Wadim Kehl, Anders GlentBuch, Dirk Kraft, Bertram Drost, Joel Vidal, Stephan Ihrke, Xenophon Zabulis, Caner Sahin, Fabian Manhardt, Federico Tombari, Tae-Kyun Kim, Jiri Matas, Carsten Rother | We propose a benchmark for 6D pose estimation of a rigid object from a single RGB-D input image. |
379 | 3D Vehicle Trajectory Reconstruction in Monocular Video Data Using Environment Structure Constraints | Sebastian Bullinger, Christoph Bodensteiner, Michael Arens, Rainer Stiefelhagen | We present a framework to reconstruct three-dimensional vehicle trajectories using monocular video data. Due to the lack of suitable benchmark datasets we present a new dataset to evaluate the quality of reconstructed three-dimensional vehicle trajectories. |
380 | Appearance-Based Gaze Estimation via Evaluation-Guided Asymmetric Regression | Yihua Cheng, Feng Lu, Xucong Zhang | In this paper, we propose the Asymmetric Regression-Evaluation Network (ARE-Net), and try to improve the gaze estimation performance to its full extent. |
381 | Discriminative Region Proposal Adversarial Networks for High-Quality Image-to-Image Translation | Chao Wang, Haiyong Zheng, Zhibin Yu, Ziqiang Zheng, Zhaorui Gu, Bing Zheng | In this paper, we present Discriminative Region Proposal Adversarial Networks (DRPAN) for high-quality image-to-image translation. |
382 | SegStereo: Exploiting Semantic Information for Disparity Estimation | Guorun Yang, Hengshuang Zhao, Jianping Shi, Zhidong Deng, Jiaya Jia | In this paper, we suggest that appropriate incorporation of semantic cues can greatly rectify prediction in commonly-used disparity estimation frameworks. |
383 | ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design | Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, Jian Sun | Taking these factors into account, this work proposes practical guidelines for efficient network de- sign. |
384 | Deep Attention Neural Tensor Network for Visual Question Answering | Yalong Bai, Jianlong Fu, Tiejun Zhao, Tao Mei | In this paper, we propose a novel deep attention neural tensor network (DA-NTN) for visual question answering, which can discover the joint correlations over images, questions and answers with tensor-based representations. |
385 | Pairwise Body-Part Attention for Recognizing Human-Object Interactions | Hao-Shu Fang, Jinkun Cao, Yu-Wing Tai, Cewu Lu | In this paper, we argue that different body parts should be paid with different attention in HOI recognition, and the correlations between different body parts should be further considered. |
386 | Deep Clustering for Unsupervised Learning of Visual Features | Mathilde Caron, Piotr Bojanowski, Armand Joulin, Matthijs Douze | In this work, we present DeepCluster, a clustering method that jointly learns the parameters of a neural network and the cluster assignments of the resulting features. |
387 | Shuffle-Then-Assemble: Learning Object-Agnostic Visual Relationship Features | Xu Yang, Hanwang Zhang, Jianfei Cai | To alleviate the bias, we propose a novel Shuffle-Then-Assemble pre-training strategy. |
388 | Learning to Look around Objects for Top-View Representations of Outdoor Scenes | Samuel Schulter, Menghua Zhai, Nathan Jacobs, Manmohan Chandraker | We propose a convolutional neural network that learns to predict occluded portions of the scene layout by looking around foreground objects like cars or pedestrians. |
389 | Uncertainty Estimates and Multi-Hypotheses Networks for Optical Flow | Eddy Ilg, Ozgun Cicek, Silvio Galesso, Aaron Klein, Osama Makansi, Frank Hutter, Thomas Brox | In this paper, we make such networks estimate their local uncertainty about the correctness of their prediction, which is vital information when building decisions on top of the estimations. |
390 | Normalized Blind Deconvolution | Meiguang Jin, Stefan Roth, Paolo Favaro | In this paper we address this issue by looking at a much less studied aspect: the relative scale ambiguity between the sharp image and the blur. |
391 | Selfie Video Stabilization | Jiyang Yu, Ravi Ramamoorthi | We propose a novel algorithm for stabilizing selfie videos. |
392 | CubeNet: Equivariance to 3D Rotation and Translation | Daniel Worrall, Gabriel Brostow | We introduce a Group Convolutional Neural Network with linear equivariance to translations and right angle rotations in three dimensions. |
393 | Improving Generalization via Scalable Neighborhood Component Analysis | Zhirong Wu, Alexei A. Efros, Stella X. Yu | This paper adopts a non-parametric approach for visual recognition by optimizing feature embeddings instead of parametric classifiers. |
394 | Combining 3D Model Contour Energy and Keypoints for Object Tracking | Bogdan Bugaev, Anton Kryshchenko, Roman Belov | We present a new combined approach for monocular model-based 3D tracking. |
395 | Unsupervised Video Object Segmentation using Motion Saliency-Guided Spatio-Temporal Propagation | Yuan-Ting Hu, Jia-Bin Huang, Alexander G. Schwing | To address these challenges for unsupervised video segmentation, we develop a novel saliency estimation technique as well as a novel neighborhood graph, based on optical flow and edge cues. |
396 | Pairwise Confusion for Fine-Grained Visual Classification | Abhimanyu Dubey, Otkrist Gupta, Pei Guo, Ramesh Raskar, Ryan Farrell, Nikhil Naik | In this work, we address this problem using a novel optimization procedure for the end-to-end neural network training on FGVC tasks. |
397 | Modular Generative Adversarial Networks | Bo Zhao, Bo Chang, Zequn Jie, Leonid Sigal | Inspired by module networks, this paper propose ModularGAN for multi-domain image-to-image translation that consists of several reusable and compatible modules of different functions. |
398 | Simultaneous 3D Reconstruction for Water Surface and Underwater Scene | Yiming Qian, Yinqiang Zheng, Minglun Gong, Yee-Hong Yang | This paper presents the first approach for simultaneously recovering the 3D shape of both the wavy water surface and the moving underwater scene. |
399 | Temporal Relational Reasoning in Videos | Bolei Zhou, Alex Andonian, Aude Oliva, Antonio Torralba | In this paper, we introduce an effective and interpretable network module, the Temporal Relation Network (TRN), designed to learn and reason about temporal dependencies between video frames at multiple time scales. |
400 | YouTube-VOS: Sequence-to-Sequence Video Object Segmentation | Ning Xu, Linjie Yang, Yuchen Fan, Jianchao Yang, Dingcheng Yue, Yuchen Liang, Brian Price, Scott Cohen, Thomas Huang | Based on this dataset, we propose a novel sequence-to-sequence network to fully exploit long-term spatial-temporal information in videos for segmentation. To solve this problem, we build a new large-scale video object segmentation dataset called YouTube Video Object Segmentation dataset (YouTube-VOS). |
401 | Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input | David Harwath, Adria Recasens, Didac Suris, Galen Chuang, Antonio Torralba, James Glass | In this paper, we explore neural network models that learn to associate segments of spoken audio captions with the semantically relevant portions of natural images that they refer to. |
402 | Women also Snowboard: Overcoming Bias in Captioning Models | Lisa Anne Hendricks, Kaylee Burns, Kate Saenko, Trevor Darrell, Anna Rohrbach | In this work we investigate generation of gender-specific caption words (e.g. man, woman) based on the person’s appearance or the image context. |
403 | Graph Distillation for Action Detection with Privileged Modalities | Zelun Luo, Jun-Ting Hsieh, Lu Jiang, Juan Carlos Niebles, Li Fei-Fei | We propose a technique that tackles action detection in multimodal videos under a realistic and challenging condition in which only limited training data and partially observed modalities are available. |
404 | Hierarchical Metric Learning and Matching for 2D and 3D Geometric Correspondences | Mohammed E. Fathy, Quoc-Huy Tran, M. Zeeshan Zia, Paul Vernaza, Manmohan Chandraker | We propose concrete CNN architectures employing these ideas, and evaluate them on multiple datasets for 2D and 3D geometric matching as well as optical flow, demonstrating state-of-the-art results and generalization across datasets. |
405 | Proximal Dehaze-Net: A Prior Learning-Based Deep Network for Single Image Dehazing | Dong Yang, Jian Sun | In this paper, we propose a novel deep learning approach for single image dehazing by learning dark channel and transmission priors. |
406 | Deep Component Analysis via Alternating Direction Neural Networks | Calvin Murdock, MingFang Chang, Simon Lucey | For inference, we propose a differentiable optimization algorithm implemented using recurrent Alternating Direction Neural Networks (ADNNs) that enable parameter learning using standard backpropagation. |
407 | SDC-Net: Video prediction using spatially-displaced convolution | Fitsum A. Reda, Guilin Liu, Kevin J. Shih, Robert Kirby, Jon Barker, David Tarjan, Andrew Tao, Bryan Catanzaro | We present an approach for high-resolution video frame prediction by conditioning on both past frames and past optical flows. |
408 | Exploiting temporal information for 3D human pose estimation | Mir Rayat Imtiaz Hossain, James J. Little | In this work, we address the problem of 3D human pose estimation from a sequence of 2D human poses. |
409 | Joint Camera Spectral Sensitivity Selection and Hyperspectral Image Recovery | Ying Fu, Tao Zhang, Yinqiang Zheng, Debing Zhang, Hua Huang | In this paper, we present an efficient convolutional neural network (CNN) based method, which can jointly select the optimal CSS from a candidate dataset and learn a mapping to recover HSI from a single RGB image captured with this algorithmically selected camera. |
410 | ADVISE: Symbolism and External Knowledge for Decoding Advertisements | Keren Ye, Adriana Kovashka | We show how to use symbolic references to better understand the meaning of an ad. |
411 | Person Search via A Mask-guided Two-stream CNN Model | Di Chen, Shanshan Zhang, Wanli Ouyang, Jian Yang, Ying Tai | In this work, we tackle the problem of person search, which is a challenging task consisted of pedestrian detection and person re-identification~(re-ID). |
412 | GridFace: Face Rectification via Learning Local Homography Transformations | Erjin Zhou, Zhimin Cao, Jian Sun | In this paper, we propose a novel method, called GridFace, to reduce facial geometric variations and improve the recognition performance. |
413 | Weakly-supervised Video Summarization using Variational Encoder-Decoder and Web Prior | Sijia Cai, Wangmeng Zuo, Larry S. Davis, Lei Zhang | To leverage the plentiful web-crawled videos to improve the performance of video summarization, we present a generative modelling framework to learn the latent semantic video representations to bridge the benchmark data and web data. |
414 | Compound Memory Networks for Few-shot Video Classification | Linchao Zhu, Yi Yang | In this paper, we propose a new memory network structure for few-shot video classification by making the following contributions. |
415 | Contextual-based Image Inpainting: Infer, Match, and Translate | Yuhang Song, Chao Yang, Zhe Lin, Xiaofeng Liu, Qin Huang, Hao Li, C.-C. Jay Kuo | To this end, we propose a learning-based approach to generate visually coherent completion given a high-resolution image with missing components. |
416 | Interpretable Intuitive Physics Model | Tian Ye, Xiaolong Wang, James Davidson, Abhinav Gupta | Inspired by this observation, we propose an interpretable intuitive physics model where specific dimensions in the bottleneck layers correspond to different physical properties. |
417 | Polarimetric Three-View Geometry | Lixiong Chen, Yinqiang Zheng, Art Subpa-asa, Imari Sato | We demonstrate that, in a multi-view system, the polarization phase obtained for a surface point is induced from one of the two pencils of planes: one by specular reflections with its axis aligned with the incident light; one by diffusive reflections with its axis aligned with the surface normal. |
418 | Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language Navigation | Xin Wang, Wenhan Xiong, Hongmin Wang, William Yang Wang | In this paper, we take a radical approach to bridge the gap between synthetic studies and real-world practices—We propose a novel, planned-ahead hybrid reinforcement learning model that combines model-free and model-based reinforcement learning to solve a real-world vision-language navigation task. |
419 | Weakly-supervised 3D Hand Pose Estimation from Monocular RGB Images | Yujun Cai, Liuhao Ge, Jianfei Cai, Junsong Yuan | Particularly, we propose a weakly-supervised method, adaptating from fully-annotated synthetic dataset to weakly-labeled real-world dataset with the aid of a depth regularizer, which generates depth maps from predicted 3D pose and serves as weak supervision for 3D pose regression. |
420 | T2Net: Synthetic-to-Realistic Translation for Solving Single-Image Depth Estimation Tasks | Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai | We propose a framework, trained on synthetic image-depth pairs and unpaired real images, that comprises an image translation network for enhancing realism of input images, followed by a depth prediction network. |
421 | Instance-level Human Parsing via Part Grouping Network | Ke Gong, Xiaodan Liang, Yicheng Li, Yimin Chen, Ming Yang, Liang Lin | In this work, we make the first attempt to explore a detection-free Part Grouping Network (PGN) for efficiently parsing multiple people in an image in a single pass. |
422 | TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes | Shangbang Long, Jiaqiang Ruan, Wenjie Zhang, Xin He, Wenhao Wu, Cong Yao | To tackle this problem, we propose a more flexible representation for scene text, termed as extit{TextSnake}, which is able to effectively represent text instances in horizontal, oriented and curved forms. |
423 | PPF-FoldNet: Unsupervised Learning of Rotation Invariant 3D Local Descriptors | Haowen Deng, Tolga Birdal, Slobodan Ilic | We present PPF-FoldNet for unsupervised learning of 3D local descriptors on pure point cloud geometry. |
424 | Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association | Dapeng Chen, Hongsheng Li, Xihui Liu, Yantao Shen, Jing Shao, Zejian Yuan, Xiaogang Wang | In this paper, we propose to exploit natural language description as additional training supervisions for more effective features. |
425 | AMC: AutoML for Model Compression and Acceleration on Mobile Devices | Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, Song Han | In this paper, we propose AutoML for Model Compression (AMC) which leverages reinforcement learning to efficiently sample the design space and can improve the model compression quality. |
426 | Robust fitting in computer vision: easy or hard? | Tat-Jun Chin, Zhipeng Cai, Frank Neumann | To shed light on these issues, we present several computational hardness results for consensus maximisation. |
427 | Graph Adaptive Knowledge Transfer for Unsupervised Domain Adaptation | Zhengming Ding, Sheng Li, Ming Shao, Yun Fu | To address that issue, we develop a novel Graph Adaptive Knowledge Transfer (GAKT) model to jointly optimize target labels and domain-free features in a unified framework. |
428 | Single Image Intrinsic Decomposition without a Single Intrinsic Image | Wei-Chiu Ma, Hang Chu, Bolei Zhou, Raquel Urtasun, Antonio Torralba | In this paper, we propose to bring the best of both worlds. |
429 | Disentangling Factors of Variation with Cycle-Consistent Variational Auto-Encoders | Ananya Harsh Jha, Saket Anand, Maneesh Singh, VSR Veeravasarapu | In this paper, we introduce a novel architecture that disentangles the latent space into two complementary subspaces by using only weak supervision in form of pairwise similarity labels. By sampling from the disentangled latent subspace of interest, we can efficiently generate new data necessary for a particular task. |
430 | Deep Multi-Task Learning to Recognise Subtle Facial Expressions of Mental States | Guosheng Hu, Li Liu, Yang Yuan, Zehao Yu, Yang Hua, Zhihong Zhang, Fumin Shen, Ling Shao, Timothy Hospedales, Neil Robertson, Yongxin Yang | We address subtle expression recognition through convolutional neural networks (CNNs) by developing multi-task learning (MTL) to effectively leverage a side task: facial landmark detection. |
431 | SRDA: Generating Instance Segmentation Annotation via Scanning, Reasoning and Domain Adaptation | Wenqiang Xu, Yonglu Li, Cewu Lu | By combining the advantages of 3D scanning, reasoning, and GAN-based domain adaptation techniques, we introduce a novel pipeline named SRDA to obtain large quantities of training samples with very minor effort. To evaluate our performance, we build three representative scenes and a new dataset, with 3D models of various common objects categories and annotated real-world scene images. |
432 | DeepWrinkles: Accurate and Realistic Clothing Modeling | Zorah Lahner, Daniel Cremers, Tony Tung | We present a novel method to generate accurate and realistic clothing deformation from real data capture. |
433 | Recovering 3D Planes from a Single Image via Convolutional Neural Networks | Fengting Yang, Zihan Zhou | In this paper, we study the problem of recovering 3D planar surfaces from a single image of man-made environment. |
434 | Learning 3D Shapes as Multi-Layered Height-maps using 2D Convolutional Networks | Kripasindhu Sarkar, Basavaraj Hampiholi, Kiran Varanasi, Didier Stricker | We present a novel global representation of 3D shapes, suitable for the application of 2D CNNs. |
435 | A Geometric Perspective on Structured Light Coding | Mohit Gupta, Nikhil Nakhate | We present a mathematical framework for analysis and design of high performance structured light (SL) coding schemes. |
436 | Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation | Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Hartwig Adam | In this work, we propose to combine the advantages from both methods. |
437 | Robust image stitching with multiple registrations | Charles Herrmann, Chen Wang, Richard Strong Bowen, Emil Keyder, Michael Krainin, Ce Liu, Ramin Zabih | We propose instead the use of multiple registrations, permitting regions of the image at different depths to be captured with greater accuracy. |
438 | Depth Estimation via Affinity Learned with Convolutional Spatial Propagation Network | Xinjing Cheng, Peng Wang, Ruigang Yang | In this paper, we propose a simple yet effective convolutional spatial propagation network (CSPN) to learn the affinity matrix for depth prediction. |
439 | Object-centered image stitching | Charles Herrmann, Chen Wang, Richard Strong Bowen, Emil Keyder, Ramin Zabih | We therefore take an object-centered approach to the problem, leveraging recent advances in object detection. |
440 | Learning to Dodge A Bullet: Concyclic View Morphing via Deep Learning | Shi Jin, Ruiynag Liu, Yu Ji, Jinwei Ye, Jingyi Yu | In this paper, we present a learning-based solution that is capable of producing the bullet-time effect from only a small set of images. |
441 | CTAP: Complementary Temporal Action Proposal Generation | Jiyang Gao, Kan Chen, Ram Nevatia | Based on the complementary characteristics of these two methods, we propose a novel Complementary Temporal Action Proposal (CTAP) generator. |
442 | Effective Use of Synthetic Data for Urban Scene Semantic Segmentation | Fatemeh Sadat Saleh, Mohammad Sadegh Aliakbarian, Mathieu Salzmann, Lars Petersson, Jose M. Alvarez | In this paper, we introduce a drastically different way to handle synthetic images that does not require seeing any real images at training time. |
443 | ActiveStereoNet: End-to-End Self-Supervised Learning for Active Stereo Systems | Yinda Zhang, Sameh Khamis, Christoph Rhemann, Julien Valentin, Adarsh Kowdle, Vladimir Tankovich, Michael Schoenberg, Shahram Izadi, Thomas Funkhouser, Sean Fanello | In this paper we present ActiveStereoNet, the first deep learning solution for active stereo systems. |
444 | ShapeCodes: Self-Supervised Feature Learning by Lifting Views to Viewgrids | Dinesh Jayaraman, Ruohan Gao, Kristen Grauman | We introduce an unsupervised feature learning approach that embeds 3D shape information into a single-view image representation. |
445 | Unsupervised Domain Adaptation for 3D Keypoint Estimation via View Consistency | Xingyi Zhou, Arjun Karpur, Chuang Gan, Linjie Luo, Qixing Huang | In this paper, we introduce a novel unsupervised domain adaptation technique for the task of 3D keypoint prediction from a single depth scan or image. |
446 | Learning Discriminative Video Representations Using Adversarial Perturbations | Jue Wang, Anoop Cherian | In this paper, we propose to use such perturbations for improving the robustness of video representations. Using the original data features from the full video sequence and their perturbed counterparts, as two separate bags, we develop a binary classification problem that learns a set of discriminative hyperplanes — as a subspace — that will separate the two bags from each other. |
447 | BSN: Boundary Sensitive Network for Temporal Action Proposal Generation | Tianwei Lin, Xu Zhao, Haisheng Su, Chongjing Wang, Ming Yang | To address these difficulties, we introduce an effective proposal generation method, named Boundary-Sensitive Network (BSN), which adopts “local to global” fashion. |
448 | In the Eye of Beholder: Joint Learning of Gaze and Actions in First Person Video | Yin Li, Miao Liu, James M. Rehg | We propose a novel deep model for joint gaze estimation and action recognition in First Person Vision. |
449 | Compositional Learning for Human Object Interaction | Keizo Kato, Yin Li, Abhinav Gupta | In this paper, we explore the problem of zero-shot learning of human-object interactions. We also provide benchmarks on several dataset for zero-shot learning including both image and video. |
450 | Open-World Stereo Video Matching with Deep RNN | Yiran Zhong, Hongdong Li, Yuchao Dai | In this paper, we propose a novel deep Recurrent Neural network (RNN) that takes a continuous (possibly previously unseen) stereo video as input, and directly predict a depth-map without of any pre-training process. |
451 | stagNet: An Attentive Semantic RNN for Group Activity Recognition | Mengshi Qi, Jie Qin, Annan Li, Yunhong Wang, Jiebo Luo, Luc Van Gool | We propose a novel attentive semantic recurrent neural network (RNN), namely stagNet, for understanding group activities in videos, based on the spatio-temporal attention and semantic graph. |
452 | Double JPEG Detection in Mixed JPEG Quality Factors using Deep Convolutional Neural Network | Jinseok Park, Donghyeon Cho, Wonhyuk Ahn, Heung-Kyu Lee | This paper proposes a novel deep convolutional neural network for double JPEG detection using statistical histogram features from each block with a vectorized quantization table. We collected real-world JPEG images from the image forensic service and generated a new double JPEG dataset with 1120 quantization tables to train the network. |
453 | Deep High Dynamic Range Imaging with Large Foreground Motions | Shangzhe Wu, Jiarui Xu, Yu-Wing Tai, Chi-Keung Tang | This paper proposes the first non-flow-based deep framework for high dynamic range (HDR) imaging of dynamic scenes with large-scale foreground motions. |
454 | Learning 3D Keypoint Descriptors for Non-Rigid Shape Matching | Hanyu Wang, Jianwei Guo, Dong-Ming Yan, Weize Quan, Xiaopeng Zhang | In this paper, we present a novel deep learning framework that derives discriminative local descriptors for 3D surface shapes. |
455 | Learning Class Prototypes via Structure Alignment for Zero-Shot Recognition | Huajie Jiang, Ruiping Wang, Shiguang Shan, Xilin Chen | To tackle such problem, we propose a coupled dictionary learning approach to align the visual-semantic structures using the class prototypes, where the discriminative information lying in the visual space is utilized to improve the less discriminative semantic space. |
456 | CurriculumNet: Weakly Supervised Learning from Large-Scale Web Images | Sheng Guo, Weilin Huang, Haozhi Zhang, Chenfan Zhuang, Dengke Dong, Matthew R. Scott, Dinglong Huang | We present a simple yet efficient approach capable of training deep neural networks on large-scale weakly-supervised web images, which are crawled rawly from the Internet by using text queries, without any human annotation. |
457 | A Trilateral Weighted Sparse Coding Scheme for Real-World Image Denoising | Jun Xu, Lei Zhang, David Zhang | In this paper, we develop a trilateral weighted sparse coding (TWSC) scheme for robust real-world image denoising. |
458 | Linear Span Network for Object Skeleton Detection | Chang Liu, Wei Ke, Fei Qin, Qixiang Ye | In this paper, we first re-visit the implementation of HED, the essential principle of which can be ideally described with a linear reconstruction model. |
459 | DDRNet: Depth Map Denoising and Refinement for Consumer Depth Cameras Using Cascaded CNNs | Shi Yan, Chenglei Wu, Lizhen Wang, Feng Xu, Liang An, Kaiwen Guo, Yebin Liu | We propose a cascaded Depth Denoising and Refinement Network (DDRNet) to tackle this problem by leveraging the multi-frame fused geometry and the accompanying high quality color image through a joint training strategy. |
460 | ELEGANT: Exchanging Latent Encodings with GAN for Transferring Multiple Face Attributes | Taihong Xiao, Jiapeng Hong, Jinwen Ma | To address these limitations, we propose a novel model which receives two images of opposite attributes as inputs. |
461 | Progressive Structure from Motion | Alex Locher, Michal Havlena, Luc Van Gool | In this paper we propose a new reconstruction pipeline working in a progressive manner rather than in a batch processing scheme. |
462 | GAL: Geometric Adversarial Loss for Single-View 3D-Object Reconstruction | Li Jiang, Shaoshuai Shi, Xiaojuan Qi, Jiaya Jia | In this paper, we present a framework for reconstructing a point-based 3D model of an object from a single view image. |
463 | Viewpoint Estimation—Insights & Model | Gilad Divon, Ayellet Tal | This paper addresses the problem of viewpoint estimation of an object in a given image. |
464 | Super-Resolution and Sparse View CT Reconstruction | Guangming Zang, Mohamed Aly, Ramzi Idoughi, Peter Wonka, Wolfgang Heidrich | We present a flexible framework for robust computed tomography (CT) reconstruction with a specific emphasis on recovering thin 1D and 2D manifolds embedded in 3D volumes. |
465 | NNEval: Neural Network based Evaluation Metric for Image Captioning | Naeha Sharif, Lyndon White, Mohammed Bennamoun, Syed Afaq Ali Shah | In this paper, we present the first learning-based metric to evaluate image captions. |
466 | Monocular Depth Estimation Using Whole Strip Masking and Reliability-Based Refinement | Minhyeok Heo, Jaehan Lee, Kyung-Rae Kim, Han-Ul Kim, Chang-Su Kim | We propose a monocular depth estimation algorithm, which extracts a depth map from a single image, based on whole strip masking (WSM) and reliability-based refinement. |
467 | Dynamic Filtering with Large Sampling Field for ConvNets | Jialin Wu, Dai Li, Yu Yang, Chandrajit Bajaj, Xiangyang Ji | We propose a dynamic filtering strategy with large sampling field for ConvNets (LS-DFN), where the position-specific kernels learn from not only the identical position but also multiple sampled neighbour regions. |
468 | SaaS: Speed as a Supervisor for Semi-supervised Learning | Safa Cicek, Alhussein Fawzi, Stefano Soatto | We introduce the SaaS Algorithm for semi-supervised learning, which uses learning speed during stochastic gradient descent in a deep neural network to measure the quality of an iterative estimate of the posterior probability of unknown labels. |
469 | AutoLoc: Weakly-supervised Temporal Action Localization in Untrimmed Videos | Zheng Shou, Hang Gao, Lei Zhang, Kazuyuki Miyazawa, Shih-Fu Chang | We propose a novel Outer-Inner-Contrastive (OIC) loss to automatically discover the needed segment-level supervision for training such a boundary predictor. |
470 | Local Spectral Graph Convolution for Point Set Feature Learning | Chu Wang, Babak Samari, Kaleem Siddiqi | In the present article, we propose to overcome this limitation by using spectral graph convolution on a local graph, combined with a novel graph pooling strategy. |
471 | Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights | Arun Mallya, Dillon Davis, Svetlana Lazebnik | This work presents a method for adapting a single, fixed deep neural network to multiple tasks without affecting performance on already learned tasks. |
472 | VideoMatch: Matching based Video Object Segmentation | Yuan-Ting Hu, Jia-Bin Huang, Alexander G. Schwing | To address this issue, we develop a novel matching based algorithm for video object segmentation. |
473 | Wasserstein Divergence for GANs | Jiqing Wu, Zhiwu Huang, Janine Thoma, Dinesh Acharya, Luc Van Gool | As a concrete application, we introduce a Wasserstein divergence objective for GANs (WGAN-div), which can faithfully approximate W-div through optimization. |
474 | Semi-supervised FusedGAN for Conditional Image Generation | Navaneeth Bodla, Gang Hua, Rama Chellappa | We present FusedGAN, a deep network for conditional image synthesis with controllable sampling of diverse images. |
475 | Practical Black-box Attacks on Deep Neural Networks using Efficient Query Mechanisms | Arjun Nitin Bhagoji, Warren He, Bo Li, Dawn Song | In this paper, we propose novel Gradient Estimation black-box attacks for adversaries with query access to the target model’s class probabilities, which do not rely on transferability. |
476 | PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model | George Papandreou, Tyler Zhu, Liang-Chieh Chen, Spyros Gidaris, Jonathan Tompson, Kevin Murphy | We present a box-free bottom-up approach for the tasks of pose estimation and instance segmentation of people in multi-person images using an efficient single-shot model. |
477 | Context Refinement for Object Detection | Zhe Chen, Shaoli Huang, Dacheng Tao | To address this problem, we propose a context refinement algorithm that explores rich contextual information to better refine each proposed region. |
478 | Attention-GAN for Object Transfiguration in Wild Images | Xinyuan Chen, Chang Xu, Xiaokang Yang, Dacheng Tao | This paper studies the object transfiguration problem in wild images. |
479 | Pose Guided Human Video Generation | Ceyuan Yang, Zhe Wang, Xinge Zhu, Chen Huang, Jianping Shi, Dahua Lin | In this paper, we propose a pose guided method to synthesize human videos in a disentangled way: plausible motion prediction and coherent appearance generation. |
480 | Exploring the Limits of Weakly Supervised Pretraining | Dhruv Mahajan, Ross Girshick, Vignesh Ramanathan, Kaiming He, Manohar Paluri, Yixuan Li, Ashwin Bharambe, Laurens van der Maaten | In this paper, we present a unique study of transfer learning with large convolutional networks trained to predict hashtags on billions of social media images. |
481 | Exploiting Vector Fields for Geometric Rectification of Distorted Document Images | Gaofeng MENG, Yuanqi SU, Ying WU, Shiming XIANG, Chunhong PAN | This paper proposes a segment-free method for geometric rectification of a distorted document image captured by a hand-held camera. |
482 | Task-driven Webpage Saliency | Quanlong Zheng, Jianbo Jiao, Ying Cao, Rynson W.H. Lau | In this paper, we present an end-to-end learning framework for predicting task-driven visual saliency on webpages. |
483 | Characterizing Adversarial Examples Based on Spatial Consistency Information for Semantic Segmentation | Chaowei Xiao, Ruizhi Deng, Bo Li, Fisher Yu, Mingyan Liu, Dawn Song | In this paper, we aim to characterize adversarial examples based on spatial context information in segmentation. |
484 | DYAN: A Dynamical Atoms-Based Network For Video Prediction | Wenqian Liu, Abhishek Sharma, Octavia Camps, Mario Sznaier | In this paper, we introduce DYAN, a novel network with very few parameters and easy to train, which produces accurate, high quality frame predictions, significantly faster than previous approaches. |
485 | SpiderCNN: Deep Learning on Point Sets with Parameterized Convolutional Filters | Yifan Xu, Tianqi Fan, Mingye Xu, Long Zeng, Yu Qiao | Towards this we propose a novel convolutional architecture, termed SpiderCNN, to efficiently extract geometric features from point clouds. |
486 | Hard-Aware Point-to-Set Deep Metric for Person Re-identification | Rui Yu, Zhiyong Dou, Song Bai, Zhaoxiang Zhang, Yongchao Xu, Xiang Bai | To solve this problem, we propose a Hard-Aware Point-to-Set (HAP2S) loss with a soft hard-mining scheme. |
487 | Coded Two-Bucket Cameras for Computer Vision | Mian Wei, Navid Sarhangnejad, Zhengfan Xia, Nikita Gusev, Nikola Katic, Roman Genov, Kiriakos N. Kutulakos | We introduce coded two-bucket (C2B) imaging, a new operating principle for computational sensors with applications in active 3D shape estimation and coded-exposure imaging. |
488 | Egocentric Activity Prediction via Event Modulated Attention | Yang Shen, Bingbing Ni, Zefan Li, Ning Zhuang | This work explicitly addresses these issues by proposing an asynchronous gaze-event driven attentive activity prediction network. |
489 | Real-Time MDNet | Ilchae Jung, Jeany Son, Mooyeol Baek, Bohyung Han | We present a fast and accurate visual tracking algorithm based on the multi-domain convolutional neural network (MDNet). |
490 | Image Generation from Sketch Constraint Using Contextual GAN | Yongyi Lu, Shangzhe Wu, Yu-Wing Tai, Chi-Keung Tang | In this paper we investigate image generation guided by hand sketch. |
491 | Real-Time Hair Rendering using Sequential Adversarial Networks | Lingyu Wei, Liwen Hu, Vladimir Kim, Ersin Yumer, Hao Li | We present an adversarial network for rendering photorealistic hair as an alternative to conventional computer graphics pipelines. |
492 | Sparsely Aggregated Convolutional Networks | Ligeng Zhu, Ruizhi Deng, Michael Maire, Zhiwei Deng, Greg Mori, Ping Tan | We propose a new internal connection structure which aggregates only a sparse set of previous outputs at any given depth. |
493 | Revisiting the Inverted Indices for Billion-Scale Approximate Nearest Neighbors | Dmitry Baranchuk, Artem Babenko, Yury Malkov | In this paper, we argue that the potential of the simple inverted index was not fully exploited in previous works and advocate its usage both for the highly-entangled deep descriptors and relatively disentangled SIFT descriptors. |
494 | Joint Task-Recursive Learning for Semantic Segmentation and Depth Estimation | Zhenyu Zhang, Zhen Cui, Chunyan Xu, Zequn Jie, Xiang Li, Jian Yang | In this paper, we propose a novel joint Task-Recursive Learning (TRL) framework for the closing-loop semantic segmentation and monocular depth estimation tasks. |
495 | Fast, Accurate, and Lightweight Super-Resolution with Cascading Residual Network | Namhyuk Ahn, Byungkon Kang, Kyung-Ah Sohn | In this paper, we address this issue by proposing an accurate and lightweight deep network for image super-resolution. |
496 | Deep Image Demosaicking using a Cascade of Convolutional Residual Denoising Networks | Filippos Kokkinos, Stamatios Lefkimmiatis | This improvement in reconstruction quality is attributed to the principled way we design our network architecture, which also requires fewer trainable parameters than the current state-of-the-art deep network solution. |
497 | Modality Distillation with Multiple Stream Networks for Action Recognition | Nuno C. Garcia, Pietro Morerio, Vittorio Murino | This paper presents a new approach for multimodal video action recognition, developed within the unified frameworks of distillation and privileged information, named generalized distillation. |
498 | Direct Sparse Odometry With Rolling Shutter | David Schubert, Nikolaus Demmel, Vladyslav Usenko, Jorg Stuckler, Daniel Cremers | In this paper, we propose a novel direct monocular VO method that incorporates a rolling-shutter model. |
499 | Multi-Class Model Fitting by Energy Minimization and Mode-Seeking | Daniel Barath, Jiri Matas | Considering that a group of outliers may form spatially coherent structures in the data, we propose a cross-validation-based technique removing statistically insignificant instances. |
500 | Model-free Consensus Maximization for Non-Rigid Shapes | Thomas Probst, Ajad Chhatkuli, Danda Pani Paudel, Luc Van Gool | In this paper, we formulate the model-free consensus maximization as an Integer Program in a graph using ‘rules’ on measurements. |
501 | How good is my GAN? | Konstantin Shmelkov, Cordelia Schmid, Karteek Alahari | In this paper we introduce two measures based on image classification—GAN-train and GAN-test, which approximate the recall (diversity) and precision (quality of the image) of GANs respectively. |
502 | Pose Partition Networks for Multi-Person Pose Estimation | Xuecheng Nie, Jiashi Feng, Junliang Xing, Shuicheng Yan | This paper proposes a novel Pose Partition Network (PPN) to address the challenging multi-person pose estimation problem. |
503 | 3D-CODED: 3D Correspondences by Deep Deformation | Thibault Groueix, Matthew Fisher, Vladimir G. Kim, Bryan C. Russell, Mathieu Aubry | We present a new deep learning approach for matching deformable shapes by introducing Shape Deformation Networks which jointly encode 3D shapes and correspondences. |
504 | Interpretable Basis Decomposition for Visual Explanation | Bolei Zhou, Yiyou Sun, David Bau, Antonio Torralba | In this work we propose a new framework called Interpretable Basis Decomposition for providing visual explanations for classification networks. |
505 | Deep Virtual Stereo Odometry: Leveraging Deep Depth Prediction for Monocular Direct Sparse Odometry | Nan Yang, Rui Wang, Jorg Stuckler, Daniel Cremers | In this paper, we propose to leverage deep monocular depth prediction to overcome limitations of geometry-based monocular visual odometry. |
506 | HandMap: Robust Hand Pose Estimation via Intermediate Dense Guidance Map Supervision | Xiaokun Wu, Daniel Finnegan, Eamonn O’Neill, Yong-Liang Yang | This work presents a novel hand pose estimation framework via intermediate dense guidance map supervision. |
507 | Partial Adversarial Domain Adaptation | Zhangjie Cao, Lijia Ma, Mingsheng Long, Jianmin Wang | This paper introduces partial domain adaptation as a new domain adaptation scenario, which relaxes the fully shared label space assumption to that the source label space subsumes the target label space. |
508 | ExFuse: Enhancing Feature Fusion for Semantic Segmentation | Zhenli Zhang, Xiangyu Zhang, Chao Peng, Xiangyang Xue, Jian Sun | Based on this observation, we propose a new framework, named ExFuse, to bridge the gap between low-level and high-level features thus significantly improve the segmentation quality by 4.0% in total. |
509 | Audio-Visual Event Localization in Unconstrained Videos | Yapeng Tian, Jing Shi, Bochen Li, Zhiyao Duan, Chenliang Xu | In this paper, we introduce a novel problem of audio-visual event localization in unconstrained videos. |
510 | Understanding Degeneracies and Ambiguities in Attribute Transfer | Attila Szabo, Qiyang Hu, Tiziano Portenier, Matthias Zwicker, Paolo Favaro | To address the shortcut problem, we introduce novel constraints on image pairs and triplets and show their effectiveness both analytically and experimentally. |
511 | Relaxation-Free Deep Hashing via Policy Gradient | Xin Yuan, Liangliang Ren, Jiwen Lu, Jie Zhou | In this paper, we propose a simple yet effective relaxation-free method to learn more effective binary codes via policy gradient for scalable image search. |
512 | How Local is the Local Diversity? Reinforcing Sequential Determinantal Point Processes with Dynamic Ground Sets for Supervised Video Summarization | Yandong Li, Liqiang Wang, Tianbao Yang, Boqing Gong | In this paper, we propose a novel probabilistic model, built upon SeqDPP, to dynamically control the time span of a video segment upon which the local diversity is imposed. |
513 | Question Type Guided Attention in Visual Question Answering | Yang Shi, Tommaso Furlanello, Sheng Zha, Animashree Anandkumar | In this work, we propose Question Type-guided Attention (QTA). |
514 | Saliency Benchmarking Made Easy: Separating Models, Maps and Metrics | Matthias Kummerer, Thomas S. A. Wallis, Matthias Bethge | Instead, we propose a principled approach to solve the benchmarking problem by separating the notions of saliency models, maps and metrics. |
515 | A Unified Framework for Multi-View Multi-Class Object Pose Estimation | Chi Li, Jin Bai, Gregory D. Hager | In this work, we present a scalable framework for accurately inferring six Degree-of-Freedom (6-DoF) pose for a large number of object classes from single or multiple views. |
516 | A New Large Scale Dynamic Texture Dataset with Application to ConvNet Understanding | Isma Hadji, Richard P. Wildes | This paper introduces a new large scale dynamic texture dataset. |
517 | Dynamic Task Prioritization for Multitask Learning | Michelle Guo, Albert Haque, De-An Huang, Serena Yeung, Li Fei-Fei | We propose dynamic task prioritization for multitask learning. |
518 | Deep Feature Factorization For Concept Discovery | Edo Collins, Radhakrishna Achanta, Sabine Susstrunk | We propose Deep Feature Factorization (DFF), a method capable of localizing similar semantic concepts within an image or a set of images. |
519 | Diverse feature visualizations reveal invariances in early layers of deep neural networks | Santiago A. Cadena, Marissa A. Weis, Leon A. Gatys, Matthias Bethge, Alexander S. Ecker | Here we propose a method to discover invariances in the responses of hidden layer units of deep neural networks. |
520 | Reinforced Temporal Attention and Split-Rate Transfer for Depth-Based Person Re-Identification | Nikolaos Karianakis, Zicheng Liu, Yinpeng Chen, Stefano Soatto | We address the problem of person re-identification from commodity depth sensors. |
521 | NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications | Tien-Ju Yang, Andrew Howard, Bo Chen, Xiao Zhang, Alec Go, Mark Sandler, Vivienne Sze, Hartwig Adam | This work proposes an algorithm, called NetAdapt, that automatically adapts a pre-trained deep neural network to a mobile platform given a resource budget. |
522 | Estimating Depth from RGB and Sparse Sensing | Zhao Chen, Vijay Badrinarayanan, Gilad Drozdov, Andrew Rabinovich | We present a deep model that can accurately produce dense depth maps given an RGB image with known depth at a very sparse set of pixels. |
523 | Grounding Visual Explanations | Lisa Anne Hendricks, Ronghang Hu, Trevor Darrell, Zeynep Akata | To overcome this limitation, we propose a phrase-critic model to refine generated candidate explanations augmented with flipped phrases which we use as negative examples while training. |
524 | End-to-End Incremental Learning | Francisco M. Castro, Manuel J. Marin-Jimenez, Nicolas Guil, Cordelia Schmid, Karteek Alahari | We address this issue with our approach to learn deep neural networks incrementally, using new data and only a small exemplar set corresponding to samples from the old classes. |
525 | Toward Scale-Invariance and Position-Sensitive Region Proposal Networks | Hsueh-Fu Lu, Xiaofei Du, Ping-Lin Chang | In this work, we propose an advanced object proposal network in favour of translation-invariance for objectness classification, translation-variance for bounding box regression, large effective receptive fields for capturing global context and scale-invariance for dealing with a range of object sizes from extremely small to large. |
526 | Deep Regression Tracking with Shrinkage Loss | Xiankai Lu, Chao Ma, Bingbing Ni, Xiaokang Yang, Ian Reid, Ming-Hsuan Yang | To balance training data, we propose a novel shrinkage loss to penalize the importance of easy training data. |
527 | A Systematic DNN Weight Pruning Framework using Alternating Direction Method of Multipliers | Tianyun Zhang, Shaokai Ye, Kaiqi Zhang, Jian Tang, Wujie Wen, Makan Fardad, Yanzhi Wang | To mitigate these limitations, we present a systematic weight pruning framework of DNNs using the alternating direction method of multipliers (ADMM). |
528 | Adversarial Open-World Person Re-Identification | Xiang Li, Ancong Wu, Wei-Shi Zheng | In this work, we introduce a deep open-world group-based person re-id model based on adversarial learning to alleviate the attack problem caused by similar non-target people. |
529 | Conditional Image-Text Embedding Networks | Bryan A. Plummer, Paige Kordas, M. Hadi Kiapour, Shuai Zheng, Robinson Piramuthu, Svetlana Lazebnik | This paper presents an approach for grounding phrases in images which jointly learns multiple text-conditioned embeddings in a single end-to-end model. |
530 | DeepIM: Deep Iterative Matching for 6D Pose Estimation | Yi Li, Gu Wang, Xiangyang Ji, Yu Xiang, Dieter Fox | In this work, we propose a novel deep neural network for 6D pose matching named DeepIM. |
531 | Dist-GAN: An Improved GAN using Distance Constraints | Ngoc-Trung Tran, Tuan-Anh Bui, Ngai-Man Cheung | We introduce effective training algorithms for Generative Adversarial Networks (GAN) to alleviate mode collapse and gradient vanishing. |
532 | Pivot Correlational Neural Network for Multimodal Video Categorization | Sunghun Kang, Junyeong Kim, Hyunsoo Choi, Sungjin Kim, Chang D. Yoo | This paper considers an architecture for multimodal video categorization referred to as Pivot Correlational Neural Network (Pivot CorrNN). |
533 | Generative Domain-Migration Hashing for Sketch-to-Image Retrieval | Jingyi Zhang, Fumin Shen, Li Liu, Fan Zhu, Mengyang Yu, Ling Shao, Heng Tao Shen, Luc Van Gool | In this work, we propose a Generative Domain-migration Hashing (GDH) approach, which for the first time generates hashing codes from synthetic natural images that are migrated from sketches. |
534 | TBN: Convolutional Neural Network with Ternary Inputs and Binary Weights | Diwen Wan, Fumin Shen, Li Liu, Fan Zhu, Jie Qin, Ling Shao, Heng Tao Shen | In this work, we propose a Ternary-Binary Network (TBN), which provides an efficient approximation to standard CNNs. |
535 | Multi-object Tracking with Neural Gating Using Bilinear LSTM | Chanho Kim, Fuxin Li, James M. Rehg | In this paper, we propose a novel recurrent network model, the bilinear LSTM, in order to improve long-term appearance models via a recurrent network. |
536 | Highly-Economized Multi-View Binary Compression for Scalable Image Clustering | Zheng Zhang, Li Liu, Jie Qin, Fan Zhu, Fumin Shen, Yong Xu, Ling Shao, Heng Tao Shen | To tackle this challenge, this paper introduces a novel approach named Highly-economized Scalable Image Clustering (HSIC) that radically surpasses conventional image clustering methods via binary compression. |
537 | Part-Aligned Bilinear Representations for Person Re-Identification | Yumin Suh, Jingdong Wang, Siyu Tang, Tao Mei, Kyoung Mu Lee | In this paper, we propose a network that learns a part-aligned representation for person re-identification. |
538 | End-to-end View Synthesis for Light Field Imaging with Pseudo 4DCNN | Yunlong Wang, Fei Liu, Zilei Wang, Guangqi Hou, Zhenan Sun, Tieniu Tan | In this paper, an end-to-end deep learning framework is proposed to solve these problems by exploring Pseudo 4DCNN. |
539 | Action Anticipation with RBF Kernelized Feature Mapping RNN | Yuge Shi, Basura Fernando, Richard Hartley | We introduce a novel Recurrent Neural Network-based algorithm for future video feature generation and action anticipation called ame. |
540 | Joint Blind Motion Deblurring and Depth Estimation of Light Field | Dongwoo Lee, Haesol Park, In Kyu Park, Kyoung Mu Lee | In this paper, we propose a novel algorithm to estimate all blur model variables jointly, including latent sub-aperture image, camera motion, and scene depth from the blurred 4D light field. |
541 | Learning to Navigate for Fine-grained Classification | Ze Yang, Tiange Luo, Dong Wang, Zhiqiang Hu, Jun Gao, Liwei Wang | To handle this circumstance, we propose a novel self-supervision mechanism to effectively localize informative regions without the need of fine-grained bounding-box/part annotations. |
542 | Specular-to-Diffuse Translation for Multi-View Reconstruction | Shihao Wu, Hui Huang, Tiziano Portenier, Matan Sela, Daniel Cohen-Or, Ron Kimmel, Matthias Zwicker | To alleviate this restriction, we introduce S2Dnet, a generative adversarial network for transferring multiple views of objects with specular reflection into diffuse ones, so that multi-view reconstruction methods can be applied more effectively. In addition, we carefully design and generate a large synthetic training data set using physically-based rendering. |
543 | Clustering Convolutional Kernels to Compress Deep Neural Networks | Sanghyun Son, Seungjun Nah, Kyoung Mu Lee | In this paper, we propose a novel method to compress CNNs by reconstructing the network from a small set of spatial convolution kernels. |
544 | Scale Aggregation Network for Accurate and Efficient Crowd Counting | Xinkun Cao, Zhipeng Wang, Yanyun Zhao, Fei Su | In this paper, we propose a novel encoder-decoder network, called extit{Scale Aggregation Network (SANet)}, for accurate and efficient crowd counting. |
545 | Fine-Grained Visual Categorization using Meta-Learning Optimization with Sample Selection of Auxiliary Data | Yabin Zhang, Hui Tang, Kui Jia | To address this issue, we propose in this paper a new deep FGVC model termed MetaFGNet. |
546 | Sampling Algebraic Varieties for Robust Camera Autocalibration | Danda Pani Paudel, Luc Van Gool | During the BnB search, we exploit the theory of sampling algebraic varieties, to test the positivity of any polynomial within a parameter’s interval, i.e. outliers with certainty. |
547 | Stacked Cross Attention for Image-Text Matching | Kuang-Huei Lee, Xi Chen, Gang Hua, Houdong Hu, Xiaodong He | In this paper, we study the problem of image-text matching. |
548 | Data-Driven Sparse Structure Selection for Deep Neural Networks | Zehao Huang, Naiyan Wang | In this paper, we propose a simple and effective framework to learn and prune deep models in an end-to-end manner. |
549 | DeepPhys: Video-Based Physiological Measurement Using Convolutional Attention Networks | Weixuan Chen, Daniel McDuff | We propose the first end-to-end system for video-based measurement of heart and breathing rate using a deep convolutional network. |
550 | Attribute-Guided Face Generation Using Conditional CycleGAN | Yongyi Lu, Yu-Wing Tai, Chi-Keung Tang | To address this problem, we condition the CycleGAN and propose conditional CycleGAN, which is designed to 1) handle unpaired training data because the training low/high-res and high-res attribute images may not necessarily align with each other, and to 2) allow easy control of the appearance of the generated face via the input attributes. |
551 | On the Solvability of Viewing Graphs | Matthew Trager, Brian Osserman, Jean Ponce | We study characterizations of “solvable” viewing graphs, and present several new results that can be applied to determine which pairs of views may be used to recover all camera parameters. |
552 | A-Contrario Horizon-First Vanishing Point Detection Using Second-Order Grouping Laws | Gilles Simon, Antoine Fond, Marie-Odile Berger | We show that, in images of man-made environments, the horizon line can usually be hypothesized based on an a contrario detection of second-order grouping events. |
553 | Deep Volumetric Video From Very Sparse Multi-View Performance Capture | Zeng Huang, Tianye Li, Weikai Chen, Yajie Zhao, Jun Xing, Chloe LeGendre, Linjie Luo, Chongyang Ma, Hao Li | We present a deep learning-based volumetric capture approach for performance capture using a passive and highly sparse multi-view capture system. |
554 | Verisimilar Image Synthesis for Accurate Detection and Recognition of Texts in Scenes | Fangneng Zhan, Shijian Lu, Chuhui Xue | This paper presents a novel image synthesis technique that aims to generate a large amount of annotated scene text images for training accurate and robust scene text detection and recognition models. |
555 | Accurate Scene Text Detection through Border Semantics Awareness and Bootstrapping | Chuhui Xue, Shijian Lu, Fangneng Zhan | This paper presents a scene text detection technique that exploits bootstrapping and text border semantics for accurate localization of texts in scenes. |
556 | RT-GENE: Real-Time Eye Gaze Estimation in Natural Environments | Tobias Fischer, Hyung Jin Chang, Yiannis Demiris | In this work, we consider the problem of robust gaze estimation in natural environments. |
557 | Deep Video Generation, Prediction and Completion of Human Action Sequences | Haoye Cai, Chunyan Bai, Yu-Wing Tai, Chi-Keung Tang | In this paper, we focus on human action videos, and propose a general, two-stage deep framework to generate human action videos with no constraints or arbitrary number of constraints, which uniformly address the three problems: video generation given no input frames, video prediction given the first few frames, and video completion given the first and last frames. |
558 | Quantization Mimic: Towards Very Tiny CNN for Object Detection | Yi Wei, Xinyu Pan, Hongwei Qin, Wanli Ouyang, Junjie Yan | In this paper, we propose a simple and general framework for training very tiny CNNs for object detection. |
559 | Deep Structure Inference Network for Facial Action Unit Recognition | Ciprian Corneanu, Meysam Madadi, Sergio Escalera | In this paper, we propose a deep neural architecture that tackles both problems by combining learned local and global features in its initial stages and replicating a message passing algorithm between classes similar to a graphical model inference approach in later stages. |
560 | Deep Shape Matching | Filip Radenovic, Giorgos Tolias, Ondrej Chum | We cast shape matching as metric learning with convolutional networks. |
561 | Eigendecomposition-free Training of Deep Networks with Zero Eigenvalue-based Losses | Zheng Dang, Kwang Moo Yi, Yinlin Hu, Fei Wang, Pascal Fua, Mathieu Salzmann | In this paper, we introduce an eigendecomposition-free approach to training a deep network whose loss depends on the eigenvector corresponding to a zero eigenvalue of a matrix predicted by the network. |
562 | Efficient Semantic Scene Completion Network with Spatial Group Convolution | Jiahui Zhang, Hao Zhao, Anbang Yao, Yurong Chen, Li Zhang, Hongen Liao | We introduce Spatial Group Convolution (SGC) for accelerating the computation of 3D dense prediction tasks. |
563 | Interaction-aware Spatio-temporal Pyramid Attention Networks for Action Classification | Yang Du, Chunfeng Yuan, Bing Li, Lili Zhao, Yangxi Li, Weiming Hu | To address this, we propose an effective interaction-aware self-attention model inspired by PCA to learn attention maps. |
564 | Deep Texture and Structure Aware Filtering Network for Image Smoothing | Kaiyue Lu, Shaodi You, Nick Barnes | In this paper, we tackle the natural deficiency of existing methods, that they cannot properly distinguish textures and structures with similar low-level appearance. To this end, we generate a large dataset by blending natural textures with clean structure-only images, and then build a texture prediction network (TPN) that predicts location and magnitude of textures. |
565 | Learning to Solve Nonlinear Least Squares for Monocular Stereo | Ronald Clark, Michael Bloesch, Jan Czarnowski, Stefan Leutenegger, Andrew J. Davison | In this paper, we propose a neural nonlinear least squares optimization algorithm which learns to effectively optimize these cost functions even in the presence of adversities. |
566 | Unsupervised Class-Specific Deblurring | Thekke Madam Nimisha, Kumar Sunil, A. N. Rajagopalan | In this paper, we present an end-to-end deblurring network designed specifically for a class of data. |
567 | VSO: Visual Semantic Odometry | Konstantinos-Nektarios Lianos, Johannes L. Schonberger, Marc Pollefeys, Torsten Sattler | In this paper, we propose a novel visual semantic odometry (VSO) framework to enable medium-term continuous tracking of points using semantics. |
568 | Semantic Match Consistency for Long-Term Visual Localization | Carl Toft, Erik Stenborg, Lars Hammarstrand, Lucas Brynte, Marc Pollefeys, Torsten Sattler, Fredrik Kahl | In this paper, we present a method for scoring the individual correspondences by exploiting semantic information about the query image and the scene. |
569 | Learning Priors for Semantic 3D Reconstruction | Ian Cherabier, Johannes L. Schonberger, Martin R. Oswald, Marc Pollefeys, Andreas Geiger | We present a novel semantic 3D reconstruction framework which embeds variational regularization into a neural network. |
570 | The Unmanned Aerial Vehicle Benchmark: Object Detection and Tracking | Dawei Du, Yuankai Qi, Hongyang Yu, Yifan Yang, Kaiwen Duan, Guorong Li, Weigang Zhang, Qingming Huang, Qi Tian | In this paper, we construct a new UAV benchmark focusing on complex scenarios with new level challenges. |
571 | Learning with Biased Complementary Labels | Xiyu Yu, Tongliang Liu, Mingming Gong, Dacheng Tao | In this paper, we study the classification problem in which we have access to easily obtainable surrogate for true labels, namely complementary labels, which specify classes that observations do extbf{not} belong to. |
572 | NAM: Non-Adversarial Unsupervised Domain Mapping | Yedid Hoshen, Lior Wolf | In this work, we introduce an alternative method: Non-Adversarial Mapping (NAM), which separates the task of target domain generative modeling from the cross-domain mapping task. |
573 | Motion Feature Network: Fixed Motion Filter for Action Recognition | Myunggi Lee, Seungeui Lee, Sungjoon Son, Gyutae Park, Nojun Kwak | In this paper, we propose MFNet (Motion Feature Network) containing motion blocks which make it possible to encode spatio-temporal information between adjacent frames in a unified network that can be trained end-to-end. |
574 | Transferable Adversarial Perturbations | Wen Zhou, Xin Hou, Yongjun Chen, Mengyun Tang, Xiangqi Huang, Xiang Gan, Yong Yang | In this paper, We propose a novel way of perturbations for adversarial examples to enable black-box transfer. |
575 | Semantically Aware Urban 3D Reconstruction with Plane-Based Regularization | Thomas Holzmann, Michael Maurer, Friedrich Fraundorfer, Horst Bischof | We propose a method for urban 3D reconstruction, which incorporates semantic information and plane priors within the reconstruction process in order to generate visually appealing 3D models. |
576 | Learning Type-Aware Embeddings for Fashion Compatibility | Mariya I. Vasileva, Bryan A. Plummer, Krishna Dusad, Shreya Rajpal, Ranjitha Kumar, David Forsyth | This paper presents an approach to learning an image embedding that respects item type, and jointly learns notions of item similarity and compatibility in an end-to-end model. |
577 | Visual Reasoning with Multi-hop Feature Modulation | Florian Strub, Mathieu Seurin, Ethan Perez, Harm de Vries, Jeremie Mary, Philippe Preux, Aaron CourvilleOlivier Pietquin | visual dialogue task and matches state-of-the art on the ReferIt object retrieval task, and we provide additional qualitative analysis. |
578 | Object Detection in Video with Spatiotemporal Sampling Networks | Gedas Bertasius, Lorenzo Torresani, Jianbo Shi | We propose a Spatiotemporal Sampling Network (STSN) that uses deformable convolutions across time for object detection in videos. |
579 | Diverse Conditional Image Generation by Stochastic Regression with Latent Drop-Out Codes | Yang He, Bernt Schiele, Mario Fritz | We propose a novel and efficient stochastic regression approach with latent drop-out codes that combines the merits of both lines of research. |
580 | Extreme Network Compression via Filter Group Approximation | Bo Peng, Wenming Tan, Zheyang Li, Shun Zhang, Di Xie, Shiliang Pu | In this paper we propose a novel decomposition method based on filter group approximation, which can significantly reduce the redundancy of deep convolutional neural networks (CNNs) while maintaining the majority of feature representation. |
581 | Efficient Sliding Window Computation for NN-Based Template Matching | Lior Talker, Yael Moses, Ilan Shimshoni | We there- fore propose in this paper an efficient NN-based algorithm. |
582 | MPLP++: Fast, Parallel Dual Block-Coordinate Ascent for Dense Graphical Models | Siddharth Tourani, Alexander Shekhovtsov, Carsten Rother, Bogdan Savchynskyy | This work introduces a new MAP-solver, based on the popular Dual Block-Coordinate Ascent principle. |
583 | Single Image Highlight Removal with a Sparse and Low-Rank Reflection Model | Jie Guo, Zuojian Zhou, Limin Wang | We propose a sparse and low-rank reflection model for specular highlight detection and removal using a single input image. |
584 | ArticulatedFusion: Real-time Reconstruction of Motion, Geometry and Segmentation Using a Single Depth Camera | Chao Li, Zheheng Zhao, Xiaohu Guo | This paper proposes a real-time dynamic scene reconstruction method capable of reproducing the motion, geometry, and segmentation simultaneously given live depth stream from a single RGB-D camera. |
585 | Museum Exhibit Identification Challenge for the Supervised Domain Adaptation and Beyond | Piotr Koniusz, Yusuf Tas, Hongguang Zhang, Mehrtash Harandi, Fatih Porikli, Rui Zhang | We study an open problem of artwork identification and propose a new dataset dubbed Open Museum Identification Challenge (Open MIC). |
586 | Reconstruction-based Pairwise Depth Dataset for Depth Image Enhancement Using CNN | Junho Jeon, Seungyong Lee | In this paper, we propose a pairwise depth image dataset generation method using dense 3D surface reconstruction with a filtering method to remove low quality pairs. |
587 | MRF Optimization with Separable Convex Prior on Partially Ordered Labels | Csaba Domokos, Frank R. Schmidt, Daniel Cremers | In this paper we propose a generalization to partially ordered sets. |
588 | Deep Generative Models for Weakly-Supervised Multi-Label Classification | Hong-Min Chu, Chih-Kuan Yeh, Yu-Chiang Frank Wang | In this paper, we tackle WS-MLC by learning deep generative models for describing the collected data. |
589 | Attend and Rectify: a gated attention mechanism for fine-grained recovery | Pau Rodriguez, Josep M. Gonfaus, Guillem Cucurull, F. XavierRoca, Jordi Gonzalez | We propose a novel attention mechanism to enhance Convolutional Neural Networks for fine-grained recognition. |
590 | ADVIO: An Authentic Dataset for Visual-Inertial Odometry | Santiago Cortes, Arno Solin, Esa Rahtu, Juho Kannala | We take advantage of advances in pure inertial navigation, and develop a set of versatile and challenging real-world computer vision benchmark sets for visual-inertial odometry. |
591 | SRFeat: Single Image Super-Resolution with Feature Discrimination | Seong-Jin Park, Hyeongseok Son, Sunghyun Cho, Ki-Sang Hong, Seungyong Lee | In this paper, we propose a novel GAN-based SISR method that overcomes the limitation and produces more realistic results by attaching an additional discriminator that works in the feature domain. |
592 | Efficient 6-DoF Tracking of Handheld Objects from an Egocentric Viewpoint | Rohit Pandey, Pavel Pidlypenskyi, Shuoran Yang, Christine Kaeser-Chen | We tackle the problem of efficient 6-DoF tracking of a handheld controller from egocentric camera perspectives. We collected the HMD Controller dataset which consist of over 540,000 stereo image pairs labelled with the full 6-DoF pose of the handheld controller. |
593 | Learning Visual Question Answering by Bootstrapping Hard Attention | Mateusz Malinowski, Carl Doersch, Adam Santoro, Peter Battaglia | Here, we introduce a new approach for hard attention and find it achieves very competitive performance on a recently-released visual question answering dataset, equalling and in some cases surpassing similar soft attention architectures while entirely ignoring some features. |
594 | LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks | Dongqing Zhang, Jiaolong Yang, Dongqiangzi Ye, Gang Hua | To address this gap, we propose to jointly train a quantized, bit-operation-compatible DNN and its associated quantizers, as opposed to using fixed, handcrafted quantization schemes such as uniform or logarithmic quantization. |
595 | Spatio-Temporal Channel Correlation Networks for Action Classification | Ali Diba, Mohsen Fayyaz, Vivek Sharma, M. Mahdi Arzani, Rahman Yousefzadeh, Juergen Gall, Luc Van Gool | The work in this paper is driven by the question if spatio-temporal correlations are enough for 3D convolutional neural networks (CNN)? |
596 | Video Summarization Using Fully Convolutional Sequence Networks | Mrigank Rochan, Linwei Ye, Yang Wang | In this paper, we formulate video summarization as a sequence labeling problem. |
597 | Deep Autoencoder for Combined Human Pose Estimation and Body Model Upscaling | Matthew Trumble, Andrew Gilbert, Adrian Hilton, John Collomosse | We present a method for simultaneously estimating 3D human pose and body shape from a sparse set of wide-baseline camera views. |
598 | A Style-Aware Content Loss for Real-time HD Style Transfer | Artsiom Sanakoyeu, Dmytro Kotovenko, Sabine Lang, Bjorn Ommer | To circumvent these issues, we propose a style-aware content loss, which is trained jointly with a deep encoder-decoder network for real-time, high-resolution stylization of images and videos. |
599 | A Zero-Shot Framework for Sketch based Image Retrieval | Sasi Kiran Yelamarthi, Shiva Krishna Reddy, Ashish Mishra, Anurag Mittal | To circumvent this, we propose a generative approach for the SBIR task by proposing deep conditional generative models which take the sketch as an input and fill the missing information stochastically. |
600 | Lambda Twist: An Accurate Fast Robust Perspective Three Point (P3P) Solver | Mikael Persson, Klas Nordberg | We present Lambda Twist; a novel P3P solver which is accurate, fast and robust. |
601 | Multi-modal Cycle-consistent Generalized Zero-Shot Learning | Rafael Felix, Vijay Kumar B. G., Ian Reid, Gustavo Carneiro | In this paper, we propose the use of such constraint based on a new regularization for the GAN training that forces the generated visual features to reconstruct their original semantic features. |
602 | Modeling Visual Context is Key to Augmenting Object Detection Datasets | Nikita Dvornik, Julien Mairal, Cordelia Schmid | In this work, we go one step further and leverage segmentation annotations to increase the number of object instances present on training data. |
603 | ForestHash: Semantic Hashing With Shallow Random Forests and Tiny Convolutional Networks | Qiang Qiu, Jose Lezama, Alex Bronstein, Guillermo Sapiro | In this paper, we introduce a random forest semantic hashing scheme that embeds tiny convolutional neural networks (CNN) into shallow random forests. |
604 | Extending Layered Models to 3D Motion | Dong Lao, Ganesh Sundaramoorthi | We consider the problem of inferring a layered representa-tion, its depth ordering and motion segmentation from a video in whichobjects may undergo 3D non-planar motion relative to the camera. |
605 | Scale-Awareness of Light Field Camera based Visual Odometry | Niclas Zeller, Franz Quint, Uwe Stilla | We propose a novel direct visual odometry algorithm for micro-lens-array-based light field cameras. |
606 | Joint 3D tracking of a deformable object in interaction with a hand | Aggeliki Tsoli, Antonis A. Argyros | We present a novel method that is able to track a complex deformable object in interaction with a hand. |
607 | Local Orthogonal-Group Testing | Ahmet Iscen, Ondrej Chum | Within the group testing framework we propose an efficient off-line construction of the search structures. |
608 | Occlusion-aware Hand Pose Estimation Using Hierarchical Mixture Density Network | Qi Ye, Tae-Kyun Kim | In this paper, we tackle the self-occlusion issue and provide a complete description of observed poses given an input depth image by a novel method called hierarchical mixture density networks (HMDN). |
609 | Rolling Shutter Pose and Ego-motion Estimation using Shape-from-Template | Yizhen Lao, Omar Ait-Aider, Adrien Bartoli | Unlike all existing methods which perform 3D-2D registration after augmenting the Global Shutter (GS) projection model with the velocity parameters under various kinematic models, we propose to use local differential constraints. |
610 | Recognition in Terra Incognita | Sara Beery, Grant Van Horn, Pietro Perona | We present a dataset designed to measure recognition generalization to novel environments. |
611 | 3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation | Angela Dai, Matthias Niessner | We present 3DMV, a novel method for 3D semantic scene segmentation of RGB-D scans using a joint 3D-multi-view prediction network. |
612 | A Minimal Closed-Form Solution for Multi-Perspective Pose Estimation using Points and Lines | Pedro Miraldo, Tiago Dias, Srikumar Ramalingam | We propose a minimal solution for pose estimation using both points and lines for a multi-perspective camera. |
613 | Burst Image Deblurring Using Permutation Invariant Convolutional Neural Networks | Miika Aittala, Fredo Durand | We propose a neural approach for fusing an arbitrary-length burst of photographs suffering from severe camera shake and noise into a sharp and noise-free image. |
614 | FishEyeRecNet: A Multi-Context Collaborative Deep Network for Fisheye Image Rectification | Xiaoqing Yin, Xinchao Wang, Jun Yu, Maojun Zhang, Pascal Fua, Dacheng Tao | In this paper, we propose an end-to-end multi-context collaborative deep network for removing distortions from single sheye images. To facilitate training, we construct a synthesized dataset that covers various scenes and distortion parameter settings. |
615 | Unveiling the Power of Deep Tracking | Goutam Bhat, Joakim Johnander, Martin Danelljan, Fahad Shahbaz Khan, Michael Felsberg | In this paper, we investigate this key issue and propose an approach to unlock the true potential of deep features for tracking. |
616 | LSQ++: Lower running time and higher recall in multi-codebook quantization | Julieta Martinez, Shobhit Zakhmi, Holger H. Hoos, James J. Little | Work in MCQ is heavily focused on lowering quantization error, thereby improving distance estimation and recall on benchmarks of visual descriptors at a fixed memory budget. |
617 | HBE: Hand Branch Ensemble Network for Real-time 3D Hand Pose Estimation | Yidan Zhou, Jian Lu, Kuo Du, Xiangbo Lin, Yi Sun, Xiaohong Ma | The goal of this paper is to estimate the 3D coordinates of the hand joints from a single depth image. |
618 | Retrospective Encoders for Video Summarization | Ke Zhang, Kristen Grauman, Fei Sha | In this paper, we propose a novel sequence-to-sequence learning model to address these deficiencies. |
619 | Sequential Clique Optimization for Video Object Segmentation | Yeong Jun Koh, Young-Yoon Lee, Chang-Su Kim | A novel algorithm to segment out objects in a video sequence is proposed in this work. |
620 | Constraint-Aware Deep Neural Network Compression | Changan Chen, Frederick Tung, Naveen Vedula, Greg Mori | We formulate the compression learning problem from the perspective of constrained Bayesian optimization, and introduce a cooling (annealing) strategy to guide the network compression towards the target constraints. |
621 | Linear RGB-D SLAM for Planar Environments | Pyojin Kim, Brian Coltin, H. Jin Kim | We propose a new formulation for including orthogonal planar features as a global model into a linear SLAM approach based on sequential Bayesian filtering. |
622 | Learning Region Features for Object Detection | Jiayuan Gu, Han Hu, Liwei Wang, Yichen Wei, Jifeng Dai | This work proposes a general viewpoint that unifies existing region feature extraction methods and a novel method that is end-to-end learnable. |
623 | Video Compression through Image Interpolation | Chao-Yuan Wu, Nayan Singhal, Philipp Krahenbuhl | This paper presents an alternative in an end-to-end deep learning codec. |
624 | Key-Word-Aware Network for Referring Expression Image Segmentation | Hengcan Shi, Hongliang Li, Fanman Meng, Qingbo Wu | To address aforementioned issues, in this paper, we propose a key-word-aware network, which contains a query attention model and a key-word-aware visual context model. |
625 | LAPRAN: A Scalable Laplacian Pyramid Reconstructive Adversarial Network for Flexible Compressive Sensing Reconstruction | Kai XU, Zhikang Zhang, Fengbo Ren | We propose a scalable Laplacian pyramid reconstructive adversarial network (LAPRAN) that enables high-fidelity, flexible and fast CS images reconstruction. |
626 | Recurrent Fusion Network for Image captioning | Wenhao Jiang, Lin Ma, Yu-Gang Jiang, Wei Liu, Tong Zhang | In this paper, to exploit the complementary information from multiple encoders, we propose a novel recurrent fusion network (RFNet) for the image captioning task. |
627 | On Regularized Losses for Weakly-supervised CNN Segmentation | Meng Tang, Federico Perazzi, Abdelaziz Djelouah, Ismail Ben Ayed, Christopher Schroers, Yuri Boykov | This paper proposes and experimentally compares different losses integrating MRF/CRF regularization terms. |
628 | Joint 3D Face Reconstruction and Dense Alignment with Position Map Regression Network | Yao Feng, Fan Wu, Xiaohu Shao, Yanfeng Wang, Xi Zhou | We propose a straightforward method that simultaneously reconstructs the 3D facial structure and provides dense alignment. |
629 | A Segmentation-aware Deep Fusion Network for Compressed Sensing MRI | Zhiwen Fan, Liyan Sun, Xinghao Ding, Yue Huang, Congbo Cai, John Paisley | In this paper, we proposed a segmentation-aware deep fusion network called SADFN for compressed sensing MRI. |
630 | End-to-End Deep Structured Models for Drawing Crosswalks | Justin Liang, Raquel Urtasun | In this paper we address the problem of detecting crosswalks from LiDAR and camera imagery. |
631 | Few-Shot Human Motion Prediction via Meta-Learning | Liang-Yan Gui, Yu-Xiong Wang, Deva Ramanan, Jose M. F. Moura | To accomplish this, we propose proactive and adaptive meta-learning (PAML) that introduces a novel combination of model-agnostic meta-learning and model regression networks and unifies them into an integrated, end-to-end framework. |
632 | Correcting the Triplet Selection Bias for Triplet Loss | Baosheng Yu, Tongliang Liu, Mingming Gong, Changxing Ding, Dacheng Tao | In this paper, we propose a new variant of triplet loss, which tries to reduce the bias in triplet sampling by adaptively correcting the distribution shift on sampled triplets. |
633 | 3D Face Reconstruction from Light Field Images: A Model-free Approach | Mingtao Feng, Syed Zulqarnain Gilani, Yaonan Wang, Ajmal Mian | In this paper, we exploit the Epipolar Plane Images (EPI) obtained from light field cameras and learn CNN models that recover horizontal and vertical 3D facial curves from the respective horizontal and vertical EPIs. |
634 | Straight to the Facts: Learning Knowledge Base Retrieval for Factual Visual Question Answering | Medhini Narasimhan, Alexander G. Schwing | To address this issue, we develop a learning-based approach which goes straight to the facts via a learned embedding space. |
635 | Sidekick Policy Learning for Active Visual Exploration | Santhosh K. Ramakrishnan, Kristen Grauman | We introduce sidekick policy learning to capitalize on this imbalance of observability. |
636 | Good Line Cutting: towards Accurate Pose Tracking of Line-assisted VO/VSLAM | Yipu Zhao, Patricio A. Vela | The solution we present is good line cutting, which extracts the most-informative sub-segment from each 3D line for use within the pose optimization formulation. |
637 | Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds | Haroon Idrees, Muhmmad Tayyab, Kishan Athrey, Dong Zhang, Somaya Al-Maadeed, Nasir Rajpoot, Mubarak Shah | In this paper, we propose a novel approach that simultaneously solves the problems of counting, density map estimation and localization of people in a given dense crowd image. Since localization requires high-quality images and annotations, we introduce UCF-QNRF dataset that overcomes the shortcomings of previous datasets, and contains 1.25 million humans manually marked with dot annotations. |
638 | Attentive Semantic Alignment with Offset-Aware Correlation Kernels | Paul Hongsuck Seo, Jongmin Lee, Deunsol Jung, Bohyung Han, Minsu Cho | In this paper, we introduce an attentive semantic alignment method that focuses on reliable correlations, filtering out distractors. |
639 | “Factual” or “Emotional”: Stylized Image Captioning with Adaptive Learning and Attention | Tianlang Chen, Zhongping Zhang, Quanzeng You, Chen Fang, Zhaowen Wang, Hailin Jin, Jiebo Luo | In this paper, we propose a novel stylized image captioning model that effectively takes both requirements into consideration. |
640 | CrossNet: An End-to-end Reference-based Super Resolution Network using Cross-scale Warping | Haitian Zheng, Mengqi Ji, Haoqian Wang, Yebin Liu, Lu Fang | To resolve these issues, we present CrossNet, an end-to-end and fully-convolutional deep neural network using cross-scale warping. |
641 | CPlaNet: Enhancing Image Geolocalization by Combinatorial Partitioning of Maps | Paul Hongsuck Seo, Tobias Weyand, Jack Sim, Bohyung Han | To tackle this issue, we propose a simple but effective algorithm, combinatorial partitioning, which generates a large number of fine-grained output classes by intersecting multiple coarse-grained partitionings of the earth. |
642 | Single Image Water Hazard Detection using FCN with Reflection Attention Units | Xiaofeng Han, Chuong Nguyen, Shaodi You, Jianfeng Lu | In this paper, we present a water puddle detection method based on a Fully Convolutional Network (FCN) with our newly proposed Reflection Attention Units (RAUs). |
643 | ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation | Sachin Mehta, Mohammad Rastegari, Anat Caspi, Linda Shapiro, Hannaneh Hajishirzi | We introduce a fast and efficient convolutional neural network, ESPNet, for semantic segmentation of high resolution images under resource constraints. |
644 | Multi-Attention Multi-Class Constraint for Fine-grained Image Recognition | Ming Sun, Yuchen Yuan, Feng Zhou, Errui Ding | In this paper, we propose a novel attention-based convolutional neural network (CNN) which regulates multiple object parts among different input images. |
645 | Bidirectional Feature Pyramid Network with Recurrent Attention Residual Modules for Shadow Detection | Lei Zhu, Zijun Deng, Xiaowei Hu, Chi-Wing Fu, Xuemiao Xu, Jing Qin, Pheng-Ann Heng | This paper presents a network to detect shadows by exploring and combining global context in deep layers and local context in shallow layers of a deep convolutional neural network (CNN). |
646 | Where are the blobs: Counting by Localization with Point Supervision | Issam H. Laradji, Negar Rostamzadeh, Pedro O. Pinheiro, David Vazquez, Mark Schmidt | Our contributions are three-fold: (1) we propose a novel loss function that encourages the network to output a single blob per object instance using point-level annotations only; (2) we design two methods for splitting large predicted blobs between object instances; and (3) we show that our method achieves new state-of-the-art results on several challenging datasets including the Pascal VOC and the Penguins dataset. |
647 | Dense Semantic and Topological Correspondence of 3D Faces without Landmarks | Zhenfeng Fan, Xiyuan Hu, Chen Chen, Silong Peng | We propose a general framework for dense correspondence of 3D faces without landmarks in this paper. |
648 | Textual Explanations for Self-Driving Vehicles | Jinkyu Kim, Anna Rohrbach, Trevor Darrell, John Canny, Zeynep Akata | We propose a new approach to introspective explanations which consists of two parts. |
649 | Mancs: A Multi-task Attentional Network with Curriculum Sampling for Person Re-identification | Cheng Wang, Qian Zhang, Chang Huang, Wenyu Liu, Xinggang Wang | We propose a novel deep network called Mancs that solves the person re-identification problem from the following aspects: fully utilizing the attention mechanism for the person misalignment problem and properly sampling for the ranking loss to obtain more stable person representation. |
650 | Efficient Relative Attribute Learning using Graph Neural Networks | Zihang Meng, Nagesh Adluru, Hyunwoo J. Kim, Glenn Fung, Vikas Singh | In this paper, we show how emerging ideas in graph neural networks can yield a unified solution to various problems that broadly fall under relative attribute learning. |
651 | Contemplating Visual Emotions: Understanding and Overcoming Dataset Bias | Rameswar Panda, Jianming Zhang, Haoxiang Li, Joon-Young Lee, Xin Lu, Amit K. Roy-Chowdhury | Based on our analysis, we propose a webly supervised approach by leveraging a large quantity of stock image data. |
652 | Joint & Progressive Learning from High-Dimensional Data for Multi-Label Classification | Danfeng Hong, Naoto Yokoya, Jian Xu, Xiaoxiang Zhu | Despite the fact that nonlinear subspace learning techniques (e.g. manifold learning) have successfully applied to data representation, there is still room for improvement in explainability (explicit mapping), generalization (out-of-samples), and cost-effectiveness (linearization). |
653 | Using Object Information for Spotting Text | Shitala Prasad, Adams Wai Kin Kong | In this paper, a text spotting algorithm based on text and object dependency is proposed. |
654 | MVTec D2S: Densely Segmented Supermarket Dataset | Patrick Follmann, Tobias Bottger, Philipp Hartinger, Rebecca Konig, Markus Ulrich | We introduce the Densely Segmented Supermarket (D2S) dataset, a novel benchmark for instance-aware semantic segmentation in an industrial domain. |
655 | Video Object Detection with an Aligned Spatial-Temporal Memory | Fanyi Xiao, Yong Jae Lee | We introduce Spatial-Temporal Memory Networks for video object detection. |
656 | Asynchronous, Photometric Feature Tracking using Events and Frames | Daniel Gehrig, Henri Rebecq, Guillermo Gallego, Davide Scaramuzza | We present a method that leverages the complementarity of event cameras and standard cameras to track visual features with low-latency. |
657 | Deep Recursive HDRI: Inverse Tone Mapping using Generative Adversarial Networks | Siyeong Lee, Gwon Hwan An, Suk-Ju Kang | We propose a novel method for restoring the lost dynamic range from a single low dynamic range image through a deep neural network. |
658 | DeepKSPD: Learning Kernel-matrix-based SPD Representation for Fine-grained Image Recognition | Melih Engin, Lei Wang, Luping Zhou, Xinwang Liu | More importantly, we introduce the {Daleckiv{i}-Krev{i}n formula} from Operator theory to give a concise and unified result on differentiating general functions defined on symmetric positive-definite (SPD) matrix, which shows its better numerical stability in conducting backpropagation compared with the existing method when handling the Riemannian geometry of SPD matrix. |
659 | Remote Photoplethysmography Correspondence Feature for 3D Mask Face Presentation Attack Detection | Si-Qi Liu, Xiangyuan Lan, Pong C. Yuen | In this paper, we propose a new liveness feature, called rPPG correspondence feature (CFrPPG) to precisely identify the heartbeat vestige from the observed noisy rPPG signals. |
660 | Fast Light Field Reconstruction With Deep Coarse-To-Fine Modeling of Spatial-Angular Clues | Henry Wing Fung Yeung, Junhui Hou, Jie Chen, Yuk Ying Chung, Xiaoming Chen | In this paper, we propose a learning based algorithm to reconstruct a densely-sampled LF fast and accurately from a sparsely-sampled LF in one forward pass. |
661 | Deep Discriminative Model for Video Classification | Mohammad Tavakolian, Abdenour Hadid | This paper presents a new deep learning approach for video-based scene classification. |
662 | Materials for Masses: SVBRDF Acquisition with a Single Mobile Phone Image | Zhengqin Li, Kalyan Sunkavalli, Manmohan Chandraker | We propose a material acquisition system that can recover the spatially-varying BRDF and normal map of a near-planar surface from a single image captured by a handheld mobile phone camera. |
663 | Image Reassembly Combining Deep Learning and Shortest Path Problem | Marie-Morgane Paumard, David Picard, Hedi Tabia | The main contributions of this work are: 1) several deep neural architectures to predict the relative position of image fragments that outperform the previous state of the art; 2) casting the reassembly problem into the shortest path in a graph problem for which we provide several construction algorithms depending on available information; 3) a new dataset of images taken from the Metropolitan Museum of Art (MET) dedicated to image reassembly for which we provide a clear setup and a strong baseline. |
664 | Coded Illumination and Imaging for Fluorescence Based Classification | Yuta Asano, Misaki Meguro, Chao Wang, Antony Lam, Yinqiang Zheng, Takahiro Okabe, Imari Sato | In this paper, we propose a coded illumination approach whereby light spectra are learned such that key visual fluorescent features can be easily seen for material classification. |
665 | GANimation: Anatomically-aware Facial Animation from a Single Image | Albert Pumarola, Antonio Agudo, Aleix M. Martinez, Alberto Sanfeliu, Francesc Moreno-Noguer | To address this limitation, in this paper, we introduce a novel GAN conditioning scheme based on Action Units (AU) annotations, which describes in a continuous manifold the anatomical facial movements defining a human expression. |
666 | Deep Kalman Filtering Network for Video Compression Artifact Reduction | Guo Lu, Wanli Ouyang, Dong Xu, Xiaoyun Zhang, Zhiyong Gao, Ming-Ting Sun | In this paper, we model the video artifact reduction task as a Kalman filtering procedure and restore decoded frames through a deep Kalman filtering network. |
667 | A Deeply-initialized Coarse-to-fine Ensemble of Regression Trees for Face Alignment | Roberto Valle, Jose M. Buenaposada, Antonio Valdes, Luis Baumela | In this paper we present DCFE, a real-time facial landmark regression method based on a coarse-to-fine Ensemble of Regression Trees (ERT). |
668 | Deep Expander Networks: Efficient Deep Networks from Graph Theory | Ameya Prabhu, Girish Varma, Anoop Namboodiri | Inspired by these techniques, we propose to model connections between filters of a CNN using graphs which are simultaneously sparse and well connected. |
669 | Coloring with Words: Guiding Image Colorization Through Text-based Palette Generation | Hyojin Bahng, Seungjoo Yoo, Wonwoong Cho, David Keetae Park, Ziming Wu, Xiaojuan Ma, Jaegul Choo | This paper proposes a novel approach to generate multiple color palettes that reflect the semantics of input text and then colorize a given grayscale image according to the generated color palette. For this task, we introduce our manually curated dataset called Palette-and-Text (PAT). |
670 | BusterNet: Detecting Copy-Move Image Forgery with Source/Target Localization | Yue Wu, Wael Abd-Almageed, Prem Natarajan | We introduce a novel deep neural architecture for image copy-move forgery detection (CMFD), code-named BusterNet. |
671 | Task-Aware Image Downscaling | Heewon Kim, Myungsub Choi, Bee Lim, Kyoung Mu Lee | In this paper, we present a novel operation called task-aware image downscaling to support an upscaling task. |
672 | Hierarchical Bilinear Pooling for Fine-Grained Visual Recognition | Chaojian Yu, Xinyi Zhao, Qi Zheng, Peng Zhang, Xinge You | In this paper, we present a novel model to address these issues. |
673 | Self-Calibration of Cameras with Euclidean Image Plane in Case of Two Views and Known Relative Rotation Angle | Evgeniy Martyushev | In this paper, we propose a non-iterative self-calibration algorithm for a camera with Euclidean image plane in case the remaining three internal parameters — the focal length and the principal point coordinates — are fixed but unknown. |
674 | To learn image super-resolution, use a GAN to learn how to do image degradation first | Adrian Bulat, Jing Yang, Georgios Tzimiropoulos | To circumvent this problem, we propose a two-stage process which firstly trains a High-to-Low Generative Adversarial Network (GAN) to learn how to degrade and downsample high-resolution images requiring, during training, only extit{unpaired} high and low-resolution images. |
675 | Multi-scale Residual Network for Image Super-Resolution | Juncheng Li, Faming Fang, Kangfu Mei, Guixu Zhang | In this paper, we propose a novel multi-scale residual network (MSRN) to fully exploit the image features, which outperform most of the state-of-the-art methods. |
676 | Efficient Global Point Cloud Registration by Matching Rotation Invariant Features Through Translation Search | Yinlong Liu, Chen Wang, Zhijian Song, Manning Wang | In this paper, we decouple the optimization of translation and rotation, and we propose a fast BnB algorithm to globally optimize the 3D translation parameter first. |
677 | FloorNet: A Unified Framework for Floorplan Reconstruction from 3D Scans | Chen Liu, Jiaye Wu, Yasutaka Furukawa | The ultimate goal of this indoor mapping research is to automatically reconstruct a floorplan simply by walking through a house with a smartphone in a pocket. We have created a benchmark for floorplan reconstruction by acquiring RGBD video streams for 155 residential houses or apartments with Google Tango phones and annotating complete floorplan information. |
678 | Facial Dynamics Interpreter Network: What are the Important Relations between Local Dynamics for Facial Trait Estimation? | Seong Tae Kim, Yong Man Ro | In this paper, a novel deep learning approach has been proposed to interpret the important relations between local dynamics for estimating facial traits from expression sequence. |
679 | Transferring GANs: generating images from limited data | Yaxing Wang, Chenshen Wu, Luis Herranz, Joost van de Weijer, Abel Gonzalez-Garcia, Bogdan Raducanu | Therefore, we study domain adaptation applied to image generation with generative adversarial networks. |
680 | A Dataset for Lane Instance Segmentation in Urban Environments | Brook Roberts, Sebastian Kaltwang, Sina Samangooei, Mark Pender-Bare, Konstantinos Tertikas, John Redford | Therefore, we propose a semi-automated method that allows for efficient labelling of image sequences by utilising an estimated road plane in 3D based on where the car has driven and projecting labels from this plane into all images of the sequence. We are releasing a dataset of 24,000 images and additionally show experimental semantic segmentation and instance segmentation results. |
681 | Visual Question Generation for Class Acquisition of Unknown Objects | Kohei Uehara, Antonio Tejero-De-Pablos, Yoshitaka Ushiku, Tatsuya Harada | In this paper, we propose a method for generating questions about unknown objects in an image, as means to get information about classes that have not been learned. |
682 | DeepVS: A Deep Learning Based Video Saliency Prediction Approach | Lai Jiang, Mai Xu, Tie Liu, Minglang Qiao, Zulin Wang | In this paper, we propose a novel deep learning based video saliency prediction method, named DeepVS. |
683 | Saliency Preservation in Low-Resolution Grayscale Images | Shivanthan Yohanandan, Andy Song, Adrian G. Dyer, Dacheng Tao | In this study, we explain the biological and computational motivation for LG, and show, through a range of human eye-tracking and computational modeling experiments, that saliency information is preserved in LG images. |
684 | Pairwise Relational Networks for Face Recognition | Bong-Nam Kang, Yonghyun Kim, Daijin Kim | To investigate the effective features for face recognition, we propose a novel face recognition method, called a pairwise relational network (PRN), that obtains local appearance patches around landmark points on the feature map, and captures the pairwise relation between a pair of local appearance patches. |
685 | Proxy Clouds for Live RGB-D Stream Processing and Consolidation | Adrien Kaiser, Jose Alonso Ybanez Zepeda, Tamy Boubekeur | We propose a new multiplanar superstructure for unified real-time processing of RGB-D data. |
686 | U-PC: Unsupervised Planogram Compliance | Archan Ray, Nishant Kumar, Avishek Shaw, Dipti Prasad Mukherjee | We present an end-to-end solution for recognizing merchandise displayed in the shelves of a supermarket. |
687 | Learning to Detect and Track Visible and Occluded Body Joints in a Virtual World | Matteo Fabbri, Fabio Lanzi, Simone Calderara, Andrea Palazzi, Roberto Vezzani, Rita Cucchiara | For the purpose, we propose a deep network architecture that jointly extracts people body parts and associates them across short temporal spans. To overcome the lack of surveillance data with tracking, body part and occlusion annotations we created the vastest Computer Graphics dataset for people tracking in urban scenarios by exploiting a photorealistic videogame. |
688 | Deep Metric Learning with Hierarchical Triplet Loss | Weifeng Ge | We present a novel hierarchical triplet loss (HTL) capable of automatically collecting informative training samples (triplets) via a defined hierarchical tree that encodes global context information. |
689 | Efficient Dense Point Cloud Object Reconstruction using Deformation Vector Fields | Kejie Li, Trung Pham, Huangying Zhan, Ian Reid | We propose a novel approach that addresses this limitation by replacing masks with ”deformation-fields”. |
690 | DeepJDOT: Deep Joint Distribution Optimal Transport for Unsupervised Domain Adaptation | Bharath Bhushan Damodaran, Benjamin Kellenberger, Remi Flamary, Devis Tuia, Nicolas Courty | In this work we explore a solution, named DeepJDOT, to tackle this problem: through a measure of discrepancy on joint deep representations/labels based on optimal transport, we not only learn new data representations aligned between the source and target domain, but also simultaneously preserve the discriminative information used by the classifier. |
691 | Improving DNN Robustness to Adversarial Attacks using Jacobian Regularization | Daniel Jakubovitz, Raja Giryes | In this work, we suggest a theoretically inspired novel approach to improve the networks’ robustness. |
692 | Joint Learning of Intrinsic Images and Semantic Segmentation | Anil S. Baslamisli, Thomas T. Groenestege, Partha Das, Hoang-An Le, Sezer Karaoglu, Theo Gevers | Therefore, in this paper, the tasks of semantic segmentation and intrinsic image decomposition are considered as a combined process by exploring their mutual relationship in a joint fashion. |
693 | Recurrent Tubelet Proposal and Recognition Networks for Action Detection | Dong Li, Zhaofan Qiu, Qi Dai, Ting Yao, Tao Mei | Specifically, we present a novel deep architecture called Recurrent Tubelet Proposal and Recognition (RTPR) networks to incorporate temporal context for action detection. |
694 | Domain transfer through deep activation matching | Haoshuo Huang, Qixing Huang, Philipp Krahenbuhl | We introduce a layer-wise unsupervised domain adaptation approach for the task of semantic segmentation. |
695 | Towards Privacy-Preserving Visual Recognition via Adversarial Training: A Pilot Study | Zhenyu Wu, Zhangyang Wang, Zhaowen Wang, Hailin Jin | This paper aims to improve privacy-preserving visual recognition, an increasingly demanded feature in smart camera applications, by formulating a unique adversarial training framework. |
696 | Recovering Accurate 3D Human Pose in The Wild Using IMUs and a Moving Camera | Timo von Marcard, Roberto Henschel, Michael J. Black, Bodo Rosenhahn, Gerard Pons-Moll | In this work, we propose a method that combines a single hand-held camera and a set of Inertial Measurement Units (IMUs) attached at the body limbs to estimate accurate 3D poses in the wild. |
697 | Beyond local reasoning for stereo confidence estimation with deep learning | Fabio Tosi, Matteo Poggi, Antonio Benincasa, Stefano Mattoccia | Therefore, in this paper, we propose to exploit nearby and farther clues available from image and disparity domains to obtain a more accurate confidence estimation. |
698 | Self-supervised Knowledge Distillation Using Singular Value Decomposition | Seung Hyun Lee, Dae Ha Kim, Byung Cheol Song | To improve the quality of the transferred knowledge from T-DNN, we propose a new knowledge distillation using singular value decomposition (SVD). |
699 | Implicit 3D Orientation Learning for 6D Object Detection from RGB Images | Martin Sundermeyer, Zoltan-Csaba Marton, Maximilian Durner, Manuel Brucker, Rudolph Triebel | We propose a real-time RGB-based pipeline for object detection and 6D pose estimation. |
700 | Concept Mask: Large-Scale Segmentation from Semantic Concepts | Yufei Wang, Zhe Lin, Xiaohui Shen, Jianming Zhang, Scott Cohen | We formulate semantic segmentation as a problem of image segmentation given a semantic concept, and propose a novel system which can potentially handle an unlimited number of concepts, including objects, parts, stuff, and attributes. |
701 | Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net | Xingang Pan, Ping Luo, Jianping Shi, Xiaoou Tang | Unlike existing works that designed CNN architectures to improve performance on a single task of a single domain and not generalizable, we present IBN-Net, a novel convolutional architecture, which remarkably enhances a CNN’s modeling ability on one domain (e.g. Cityscapes) as well as its generalization capacity on another domain (e.g. GTA5) without finetuning. |
702 | Adaptively Transforming Graph Matching | Fudong Wang, Nan Xue, Yipeng Zhang, Xiang Bai, Gui-Song Xia | In this paper, we introduce an adaptively transforming graph matching (ATGM) method from the perspective of functional representation. |
703 | Deep Continuous Fusion for Multi-Sensor 3D Object Detection | Ming Liang, Bin Yang, Shenlong Wang, Raquel Urtasun | In this paper, we propose a novel 3D object detector that can exploit both LIDAR as well as cameras to perform very accurate localization. |
704 | PARN: Pyramidal Affine Regression Networks for Dense Semantic Correspondence | Sangryul Jeon, Seungryong Kim, Dongbo Min, Kwanghoon Sohn | This paper presents a deep architecture for dense semantic correspondence, called pyramidal affine regression networks (PARN), that estimates locally-varying affine transformation fields across images. |
705 | Multimodal image alignment through a multiscale chain of neural networks with application to remote sensing | Armand Zampieri, Guillaume Charpiat, Nicolas Girard, Yuliya Tarabalka | We tackle here the problem of multimodal image non-rigid registration, which is of prime importance in remote sensing and medical imaging. |
706 | Beyond Part Models: Person Retrieval with Refined Part Pooling (and A Strong Convolutional Baseline) | Yifan Sun, Liang Zheng, Yi Yang, Qi Tian, Shengjin Wang | Instead of using external resources like pose estimator, we consider content consistency within each part for precise part location. |
707 | Out-of-Distribution Detection Using an Ensemble of Self Supervised Leave-out Classifiers | Apoorv Vyas, Nataraj Jammalamadaka, Xia Zhu, Dipankar Das, Bharat Kaul, Theodore L. Willke | In this work, we propose an OOD detection algorithm which comprises of an ensemble of classifiers. |
708 | Start, Follow, Read: End-to-End Full-Page Handwriting Recognition | Curtis Wigington, Chris Tensmeyer, Brian Davis, William Barrett, Brian Price, Scott Cohen | Motivated by this, we present a deep learning model that jointly learns text detection, segmentation, and recognition using mostly images without detection or segmentation annotations. |
709 | PM-GANs: Discriminative Representation Learning for Action Recognition Using Partial-modalities | Lan Wang, Chenqiang Gao, Luyu Yang, Yue Zhao, Wangmeng Zuo, Deyu Meng | In this paper, we propose a novel Partial-modal Generative Adversarial Networks (PM-GANs) that learns a full-modal representation using data from only partial modalities. |
710 | Adversarial Geometry-Aware Human Motion Prediction | Liang-Yan Gui, Yu-Xiong Wang, Xiaodan Liang, Jose M. F. Moura | Specifically, rather than using the conventional Euclidean loss, we propose a novel frame-wise geodesic loss as a geometrically meaningful, more precise distance measurement. |
711 | WildDash – Creating Hazard-Aware Benchmarks | Oliver Zendel, Katrin Honauer, Markus Murschitz, Daniel Steininger, Gustavo Fernandez Dominguez | In this work, we present a new test dataset for semantic and instance segmentation for the automotive domain. |
712 | RefocusGAN: Scene Refocusing using a Single Image | Parikshit Sakurikar, Ishit Mehta, Vineeth N. Balasubramanian, P. J. Narayanan | We introduce RefocusGAN, a deblur-then-reblur approach to single image refocusing. |
713 | Stereo Vision-based Semantic 3D Object and Ego-motion Tracking for Autonomous Driving | Peiliang Li, Tong Qin, andShaojie Shen | We propose a stereo vision-based approach for tracking the camera ego-motion and 3D semantic objects in dynamic autonomous driving scenarios. |
714 | Zero-shot keyword spotting for visual speech recognition in-the-wild | Themos Stafylakis, Georgios Tzimiropoulos | Different to prior works on KWS, which try to learn word representations merely from sequences of graphemes (i.e. letters), we propose the use of a grapheme-to-phoneme encoder-decoder model which learns how to map words to their pronunciation. |
715 | Learning Efficient Single-stage Pedestrian Detectors by Asymptotic Localization Fitting | Wei Liu, Shengcai Liao, Weidong Hu, Xuezhi Liang, Xiao Chen | Learning Efficient Single-stage Pedestrian Detectors by Asymptotic Localization Fitting |
716 | Generative Adversarial Network with Spatial Attention for Face Attribute Editing | Gang Zhang, Meina Kan, Shiguang Shan, Xilin Chen | Therefore, we introduce the spatial attention mechanism into GAN framework (referred to as SaGAN), to only alter the attribute-specific region and keep the rest unchanged. |
717 | Scenes-Objects-Actions: A Multi-Task, Multi-Label Video Dataset | Jamie Ray, Heng Wang, Du Tran, Yufei Wang, Matt Feiszli, Lorenzo Torresani, Manohar Paluri | This paper introduces a large-scale, multi-label and multitask video dataset named Scenes-Objects-Actions (SOA). |
718 | Descending, lifting or smoothing: Secrets of robust cost optimization | Christopher Zach, Guillaume Bourmaud | In this work we identify three classes of deterministic second-order algorithms that are able to tackle this type of optimization problem: direct approaches that aim to optimize the robust cost directly with a second order method, lifting-based approaches that add so called lifting variables to embed the given robust cost function into a higher dimensional space, and graduated optimization methods that solve a sequence of smoothed cost functions. |
719 | Deep Bilevel Learning | Simon Jenni, Paolo Favaro | We present a novel regularization approach to train neural networks that enjoys better generalization and test error than standard stochastic gradient descent. |
720 | Realtime Time Synchronized Event-based Stereo | Alex Zihao Zhu, Yibo Chen, Kostas Daniilidis | In this work, we propose a novel event based stereo method which addresses the problem of motion blur for a moving event camera. |
721 | Understanding Perceptual and Conceptual Fluency at a Large Scale | Shengli Hu, Ali Borji | We propose and provide estimation methods based on training DCNNs to extract and evaluate two independent constructs for designs: perceptual distinctiveness (“perceptual fluency” metrics) and ambiguity in meaning (“conceptual fluency” metrics) of each logo. We create a dataset of 543,758 logo designs spanning 39 industrial categories and 216 countries. |
722 | Structure-from-Motion-Aware PatchMatch for Adaptive Optical Flow Estimation | Daniel Maurer, Nico Marniok, Bastian Goldluecke, Andres Bruhn | In the present paper, we tackle this problem. |
723 | Unsupervised Learning of Multi-Frame Optical Flow with Occlusions | Joel Janai, Fatma Guney, Anurag Ranjan, Michael Black, Andreas Geiger | In this paper, we propose a framework for unsupervised learning of optical flow and occlusions over multiple frames. |
724 | Distortion-Aware Convolutional Filters for Dense Prediction in Panoramic Images | Keisuke Tateno, Nassir Navab, Federico Tombari | To fill this gap, we propose a learning approach for panoramic depth map estimation from a single image. |
725 | Accelerating Dynamic Programs via Nested Benders Decomposition with Application to Multi-Person Pose Estimation | Shaofei Wang, Alexander Ihler, Konrad Kording, Julian Yarkony | We present a novel approach to solve dynamic programs (DP), which are frequent in computer vision, on tree-structured graphs with exponential node state space. |
726 | OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas | Nikolaos Zioulis, Antonis Karakottas, Dimitrios Zarpalas, Petros Daras | In this work, we circumvent the challenges associated with acquiring high quality 360 datasets with ground truth depth annotations, by re-using recently released large scale 3D datasets and re-purposing them to 360 via rendering. |
727 | Joint optimization for compressive video sensing and reconstruction under hardware constraints | Michitaka Yoshida, Akihiko Torii, Masatoshi Okutomi, Kenta Endo, Yukinobu Sugiyama, Rin-ichiro Taniguchi, Hajime Nagahara | In this paper, we propose a method of jointly optimizing the exposure patterns of compressive sensing and the reconstruction framework under hardware constraints. |
728 | A+D Net: Training a Shadow Detector with Adversarial Shadow Attenuation | Hieu Le, Tomas F. Yago Vicente, Vu Nguyen, Minh Hoai, Dimitris Samaras | We propose a novel GAN-based framework for detecting shadows in images, in which a shadow detection network (D-Net) is trained together with a shadow attenuation network (A-Net) that generates adversarial training examples. |
729 | Simple Baselines for Human Pose Estimation and Tracking | Bin Xiao, Haiping Wu, Yichen Wei | This work provides simple and effective baseline methods. |
730 | Deforming Autoencoders: Unsupervised Disentangling of Shape and Appearance | Zhixin Shu, Mihir Sahasrabudhe, Riza Alp Guler, Dimitris Samaras, Nikos Paragios, Iasonas Kokkinos | In this work we introduce the Deforming Autoencoder, a generative model for images that disentangles shape from appearance in a latent representation space that is learned in a fully unsupervised manner. |
731 | Geolocation Estimation of Photos using a Hierarchical Model and Scene Classification | Eric Muller-Budack, Kader Pustu-Iren, Ralph Ewerth | In this paper, we introduce several deep learning methods, which pursue the latter approach and treat geolocalization as a classification problem where the earth is subdivided into geographical cells. |
732 | Universal Sketch Perceptual Grouping | Ke Li, Kaiyue Pang, Jifei Song, Yi-Zhe Song, Tao Xiang, Timothy M. Hospedales, Honggang Zhang | In this work we aim to develop a universal sketch grouper. |
733 | License Plate Detection and Recognition in Unconstrained Scenarios | Sergio Montazzolli Silva, Claudio Rosito Jung | This work proposes a complete ALPR system focusing on unconstrained capture scenarios, where the LP might be considerably distorted due to oblique views. |
734 | Affine Correspondences between Central Cameras for Rapid Relative Pose Estimation | Ivan Eichhardt, Dmitry Chetverikov | This paper presents a novel algorithm to estimate the relative pose, i.e. the 3D rotation and translation of two cameras, from two affine correspondences (ACs) considering any central camera model. |
735 | ConvNets and ImageNet Beyond Accuracy: Understanding Mistakes and Uncovering Biases | Pierre Stock, Moustapha Cisse | The contribution of this study is threefold. |
736 | Human Motion Analysis with Deep Metric Learning | Huseyin Coskun, David Joseph Tan, Sailesh Conjeti, Nassir Navab, Federico Tombari | Specifically, we propose (1) a novel metric learning objective based on a triplet architecture and Maximum Mean Discrepancy; as well as, (2) a novel deep architecture based on attentive recurrent neural networks. |
737 | Real-to-Virtual Domain Unification for End-to-End Autonomous Driving | Luona Yang, Xiaodan Liang, Tairui Wang, Eric Xing | In this work, we address the above limitations by taking advantage of virtual data collected from driving simulators, and present DU-drive, an unsupervised real-to-virtual domain unification framework for end-to-end autonomous driving. |
738 | Imagine This! Scripts to Compositions to Videos | Tanmay Gupta, Dustin Schwenk, Ali Farhadi, Derek Hoiem, Aniruddha Kembhavi | As a step towards this goal, we present the Composition Retrieval and Fusion Networks (CRAFT), a model capable of learning this knowledge from video-caption data and applying it for generating videos from novel captions. |
739 | Exploring Visual Relationship for Image Captioning | Ting Yao, Yingwei Pan, Yehao Li, Tao Mei | In this paper, we introduce a new design to explore the connections between objects for image captioning under the umbrella of attention-based encoder-decoder framework. |
740 | ExplainGAN: Model Explanation via Decision Boundary Crossing Transformations | Pouya Samangouei, Ardavan Saeedi, Liam Nakagawa, Nathan Silberman | We introduce a new method for interpreting computer vision models: visually perceptible, decision-boundary crossing transformations. |
741 | RESOUND: Towards Action Recognition without Representation Bias | Yingwei Li, Yi Li, Nuno Vasconcelos | RESOUND: Towards Action Recognition without Representation Bias |
742 | Fast and Accurate Camera Covariance Computation for Large 3D Reconstruction | Michal Polic, Wolfgang Forstner, Tomas Pajdla | We present a new algorithm which employs the sparsity of the uncertainty propagation and speeds the computation up about ten times wrt previous approaches. |
743 | Deep Randomized Ensembles for Metric Learning | Hong Xuan, Richard Souvenir, Robert Pless | In this work, we propose a novel, generalizable and fast method to define a family of embedding functions that can be used as an ensemble to give improved results. |
744 | The Mutex Watershed: Efficient, Parameter-Free Image Partitioning | Steffen Wolf, Constantin Pape, Alberto Bailoni, Nasim Rahaman, Anna Kreshuk, Ullrich Kothe, FredA. Hamprecht | Here, we propose an algorithm with empirically linearithmic complexity. |
745 | Integral Human Pose Regression | Xiao Sun, Bin Xiao, Fangyin Wei, Shuang Liang, Yichen Wei | This work shows that a simple integral operation relates and unifies the heat map representation and joint regression, thus avoiding the above issues. |
746 | Quadtree Convolutional Neural Networks | Pradeep Kumar Jayaraman, Jianhan Mei, Jianfei Cai, Jianmin Zheng | This paper presents a Quadtree Convolutional Neural Network (QCNN) for efficiently learning from image datasets representing sparse data such as handwriting, pen strokes, freehand sketches, etc. |
747 | Urban Zoning Using Higher-Order Markov Random Fields on Multi-View Imagery Data | Tian Feng, Quang-Trung Truong, Duc Thanh Nguyen, Jing Yu Koh, Lap-Fai Yu, Alexander Binder, Sai-Kit Yeung | This paper proposes a method for automatic urban zoning using higher-order Markov random fields (HO-MRF) built on multi-view imagery data including street-view photos and top-view satellite images. |
748 | Self-produced Guidance for Weakly-supervised Object Localization | Xiaolin Zhang, Yunchao Wei, Guoliang Kang, Yi Yang, Thomas Huang | We propose to generate Self-produced Guidance (SPG) masks which separate the foreground, the object of interest, from the background to provide the classification networks with spatial correlation information of pixels. |
749 | ECO: Efficient Convolutional Network for Online Video Understanding | Mohammadreza Zolfaghari, Kamaljeet Singh, Thomas Brox | In this paper, we introduce a network architecture that takes long-term content into account and enables fast per-video processing at the same time. |
750 | Multi-Scale Structure-Aware Network for Human Pose Estimation | Lipeng Ke, Ming-Ching Chang, Honggang Qi, Siwei Lyu | Our method can effectively improve state-of-the-art pose estimation methods that suffer from difficulties in scale varieties, occlusions, and complex multi-person scenarios. |
751 | Does Haze Removal Help CNN-based Image Classification? | Yanting Pei, Yaping Huang, Qi Zou, Yuhang Lu, Song Wang | In this paper, we empirically study this problem in the important task of image classification by using both synthetic and real hazy image datasets. |
752 | Quaternion Convolutional Neural Networks | Xuanyu Zhu, Yi Xu, Hongteng Xu, Changjian Chen | Focusing on color images, which can be naturally represented as quaternion matrices, we propose a quaternion convolutional neural network (QCNN) model to obtain more representative features. |
753 | Occlusions, Motion and Depth Boundaries with a Generic Network for Disparity, Optical Flow or Scene Flow Estimation | Eddy Ilg, Tonmoy Saikia, Margret Keuper, Thomas Brox | In this paper, we present an efficient learning-based approach to estimate occlusion areas jointly with optical flow or disparities. |
754 | Single Shot Scene Text Retrieval | Lluis Gomez, Andres Mafla, Marcal Rusinol, Dimosthenis Karatzas | In this paper we address the problem of scene text retrieval: given a text query, the system must return all images containing the queried text. |
755 | Learning to Predict Crisp Boundaries | Ruoxi Deng, Chunhua Shen, Shengjun Liu, Huibing Wang, Xinru Liu | In this work, the aim is to make CNNs produce sharp boundaries without post-processing. |
756 | Diverse and Coherent Paragraph Generation from Images | Moitreya Chatterjee, Alexander G. Schwing | To address those challenges, we propose to augment paragraph generation techniques with “coherence vectors,” “global topic vectors,” and modeling of the inherent ambiguity of associating paragraphs with images, via a variational auto-encoder formulation. |
757 | Folded Recurrent Neural Networks for Future Video Prediction | Marc Oliu, Javier Selva, Sergio Escalera | This work introduces double-mapping Gated Recurrent Units (dGRU), an extension of standard GRUs where the input is considered as a recurrent state. |
758 | Image Manipulation with Perceptual Discriminators | Diana Sungatullina, Egor Zakharov, Dmitry Ulyanov, Victor Lempitsky | In this work, we show how these two ideas can be combined in a principled and non-additive manner for unaligned image translation tasks. |
759 | DeepTAM: Deep Tracking and Mapping | Huizhong Zhou, Benjamin Ummenhofer, Thomas Brox | We present a system for keyframe-based dense camera tracking and depth map estimation that is entirely learned. |
760 | W-TALC: Weakly-supervised Temporal Activity Localization and Classification | Sujoy Paul, Sourya Roy, Amit K. Roy-Chowdhury | Towards this goal, we present W-TALC, a Weakly-supervised Temporal Activity Localization and Classification framework using only video-level labels. |
761 | Is Robustness the Cost of Accuracy? — A Comprehensive Study on the Robustness of 18 Deep Image Classification Models | Dong Su, Huan Zhang, Hongge Chen, Jinfeng Yi, Pin-Yu Chen, Yupeng Gao | To demystify the trade-offs between robustness and accuracy, in this paper we thoroughly benchmark 18 ImageNet models using multiple robustness metrics, including the distortion, success rate and transferability of adversarial examples between 306 pairs of models. |
762 | 3D Ego-Pose Estimation via Imitation Learning | Ye Yuan, Kris Kitani | Motivated by this, we propose a novel control-based approach to model human motion with physics simulation and use imitation learning to learn a video-conditioned control policy for ego-pose estimation. |
763 | Supervising the new with the old: learning SFM from SFM | Maria Klodt, Andrea Vedaldi | In this paper, we propose a number of improvements to these approaches. |
764 | Towards Realistic Predictors | Pei Wang, Nuno Vasconcelos | In this paper, we talk about a particular case of it, realistic classifiers. |
765 | Value-aware Quantization for Training and Inference of Neural Networks | Eunhyeok Park, Sungjoo Yoo, Peter Vajda | We present new techniques to apply the proposed quantization to training and inference. |
766 | Structural Consistency and Controllability for Diverse Colorization | Safa Messaoud, David Forsyth, Alexander G. Schwing | To address this issue, we develop a conditional random field based variational auto-encoder formulation which is able to achieve diversity while taking into account structural consistency. |
767 | A Dataset and Architecture for Visual Reasoning with a Working Memory | Guangyu Robert Yang, Igor Ganichev, Xiao-Jing Wang, Jonathon Shlens, David Sussillo | Preliminary analyses of the network architectures trained on COG demonstrate that the network accomplishes the task in a manner interpretable to humans. Inspired by a rich tradition of visual reasoning and memory in cognitive psychology and neuroscience, we developed an artificial, configurable visual question and answer dataset (COG) to parallel experiments in humans and animals. |
768 | From Face Recognition to Models of Identity: A Bayesian Approach to Learning about Unknown Identities from Unsupervised Data | Daniel Coelho de Castro, Sebastian Nowozin | We propose an integrated Bayesian model that coherently reasons about the observed images, identities, partial knowledge about names, and the situational context of each observation. |
769 | Open Set Learning with Counterfactual Images | Lawrence Neal, Matthew Olson, Xiaoli Fern, Weng-Keen Wong, Fuxin Li | To detect unknown classes while still generalizing to new instances of existing classes, we introduce a dataset augmentation technique that we call counterfactual image generation. |
770 | Fully-Convolutional Point Networks for Large-Scale Point Clouds | Dario Rethage, Johanna Wald, Jurgen Sturm, Nassir Navab, Federico Tombari | This work proposes a general-purpose, fully-convolutional network architecture for efficiently processing large-scale 3D data. |
771 | Improving Shape Deformation in Unsupervised Image-to-Image Translation | Aaron Gokaslan, Vivek Ramanujan, Daniel Ritchie, Kwang In Kim, James Tompkin | Inspired by semantic segmentation, we introduce a discriminator with dilated convo- lutions which is able to use information from across the entire image to train a more context-aware generator. |
772 | SwapNet: Garment Transfer in Single View Images | Amit Raj, Patsorn Sangkloy, Huiwen Chang, Jingwan Lu, Duygu Ceylan, James Hays | We present SwapNet, a framework to transfer garments across images of people with arbitrary body pose, shape, and clothing. |
773 | Learning SO(3) Equivariant Representations with Spherical CNNs | Carlos Esteves, Christine Allen-Blanchette, Ameesh Makadia, Kostas Daniilidis | We model 3D data with multi-valued spherical functions and we propose a novel spherical convolutional network that implements exact convolutions on the sphere by realizing them in the spherical harmonic domain. |
774 | Multiple-gaze geometry: Inferring novel 3D locations from gazes observed in monocular video | Ernesto Brau, Jinyan Guan, Tanya Jeffries, Kobus Barnard | We provide a Bayesian generative model for the temporal scene that captures the joint probability of camera parameters, locations of people, their gaze, what they are looking at, and locations of visual attention. |
775 | Constrained Optimization Based Low-Rank Approximation of Deep Neural Networks | Chong Li, C. J. Richard Shi | We present COBLA—Constrained Optimization Based Low-rank Approximation—a systematic method of finding an optimal low-rank approximation of a trained convolutional neural network, subject to constraints in the number of multiply-accumulate (MAC) operations and the memory footprint. |
776 | Stereo relative pose from line and point feature triplets | Alexander Vakhitov, Victor Lempitsky, Yinqiang Zheng | In this work we present two minimal solvers for stereo relative pose. |