Paper Digest: ECCV 2020 Highlights
Readers are also encouraged to read our ECCV 2020 Papers with Code/Data Page, which lists those papers that have published their code or data.
The European Conference on Computer Vision (ECCV) is one of the top computer vision conferences in the world. In 2020, it is to be held virtually due to covid-19 pandemic.
To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.
If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
team@paperdigest.org
TABLE 1: Paper Digest: ECCV 2020 Highlights
Title | Authors | Highlight | |
---|---|---|---|
1 | Quaternion Equivariant Capsule Networks for 3D Point Clouds | Yongheng Zhao; Tolga Birdal; Jan Eric Lenssen; Emanuele Menegatti; Leonidas Guibas; Federico Tombari; | We present a 3D capsule module for processing point clouds that is equivariant to 3D rotations and translations, as well as invariant to permutations of the input points. |
2 | DeepFit: 3D Surface Fitting via Neural Network Weighted Least Squares | Yizhak Ben-Shabat; Stephen Gould; | We propose a surface fitting method for unstructured 3D point clouds. |
3 | NSGANetV2: Evolutionary Multi-Objective Surrogate-Assisted Neural Architecture Search | Zhichao Lu; Kalyanmoy Deb; Erik Goodman; Wolfgang Banzhaf; Vishnu Naresh Boddeti; | In this paper, we propose an efficient NAS algorithm for generating task-specific models that are competitive under multiple competing objectives. |
4 | Describing Textures using Natural Language | Chenyun Wu; Mikayla Timm; Subhransu Maji; | In this paper, we study the problem of describing visual attributes of texture on a novel dataset containing rich descriptions of textures, and conduct a systematic study of current generative and discriminative models for grounding language to images on this dataset. |
5 | Empowering Relational Network by Self-Attention Augmented Conditional Random Fields for Group Activity Recognition | Rizard Renanda Adhi Pramono; Yie Tarng Chen; Wen Hsien Fang; | This paper presents a novel relational network for group activity recognition. |
6 | AiR: Attention with Reasoning Capability | Shi Chen; Ming Jiang; Jinhui Yang; Qi Zhao; | In this work, we propose an Attention with Reasoning capability (AiR) framework that uses attention to understand and improve the process leading to task outcomes. |
7 | Self6D: Self-Supervised Monocular 6D Object Pose Estimation | Gu Wang; Fabian Manhardt; Jianzhun Shao; Xiangyang Ji; Nassir Navab ; Federico Tombari; | To overcome this shortcoming, we propose the idea of monocular 6D pose estimation by means of self-supervised learning, removing the need for real annotations. |
8 | Invertible Image Rescaling | Mingqing Xiao; Shuxin Zheng; Chang Liu; Yaolong Wang; Di He; Guolin Ke; Jiang Bian; Zhouchen Lin; Tie-Yan Liu; | In this work, we propose to solve this problem by modeling the downscaling and upscaling processes from a new perspective, i.e. an invertible bijective transformation, which can largely mitigate the ill-posed nature of image upscaling. |
9 | Synthesize then Compare: Detecting Failures and Anomalies for Semantic Segmentation | Yingda Xia; Yi Zhang; Fengze Liu; Wei Shen; Alan L. Yuille; | In this paper, we systematically study failure and anomaly detection for semantic segmentation and propose a unified framework, consisting of two modules, to address these two related problems. |
10 | House-GAN: Relational Generative Adversarial Networks for Graph-constrained House Layout Generation | Nelson Nauata; Kai-Hung Chang; Chin-Yi Cheng; Greg Mori; Yasutaka Furukawa; | This paper proposes a novel graph-constrained generative adversarial network, whose generator and discriminator are built upon relational architecture. |
11 | Crowdsampling the Plenoptic Function | Zhengqi Li; Wenqi Xian; Abe Davis; Noah Snavely; | In this paper,we present a new approach to novel view synthesis under time-varying illumination from such data. |
12 | VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Environment | Hanyue Tu; Chunyu Wang; Wenjun Zeng; | We present mph{VoxelPose} to estimate $3$D poses of multiple people from multiple camera views. |
13 | End-to-End Object Detection with Transformers | Nicolas Carion; Francisco Massa; Gabriel Synnaeve; Nicolas Usunier; Alexander Kirillov; Sergey Zagoruyko; | We present a new method that views object detection as a direct set prediction. |
14 | DeepSFM: Structure From Motion Via Deep Bundle Adjustment | Xingkui Wei; Yinda Zhang; Zhuwen Li; Yanwei Fu; Xiangyang Xue; | In this work, we design a physical driven architecture, namely DeepSFM, inspired by traditional Bundle Adjustment (BA), which consists of two cost volume based architectures for depth and pose estimation respectively, iteratively running to improve both. |
15 | Ladybird: Quasi-Monte Carlo Sampling for Deep Implicit Field Based 3D Reconstruction with Symmetry | Yifan Xu; Tianqi Fan; Yi Yuan; Gurprit Singh; | Based on Farthest Point Sampling algorithm, we propose a sampling scheme that theoretically encourages better generalization performance, and results in fast convergence for SGD-based optimization algorithms. |
16 | Segment as Points for Efficient Online Multi-Object Tracking and Segmentation | Zhenbo Xu; Wei Zhang; Xiao Tan; Wei Yang; Huan Huang; Shilei Wen; Errui Ding; Liusheng Huang; | In this paper, we propose a highly effective method for learning instance embeddings based on segments by converting the compact image representation to un-ordered 2D point cloud representation. |
17 | Conditional Convolutions for Instance Segmentation | Zhi Tian; Chunhua Shen; Hao Chen; | We propose a simple yet effective instance segmentation framework, termed CondInst (conditional convolutions for instance segmentation). |
18 | MutualNet: Adaptive ConvNet via Mutual Learning from Network Width and Resolution | Taojiannan Yang; Sijie Zhu; Chen Chen; Shen Yan; Mi Zhang; Andrew Willis; | We propose the width-resolution mutual learning method (MutualNet) to train a network that is executable at dynamic resource constraints to achieve adaptive accuracy-efficiency trade-offs at runtime. |
19 | Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset | Menglin Jia; Mengyun Shi; Mikhail Sirotenko; Yin Cui; Claire Cardie ; Bharath Hariharan; Hartwig Adam; Serge Belongie; | In order to solve this challenging task, we propose a novel Attribute-Mask R-CNN model to jointly perform instance segmentation and localized attribute recognition, and provide a novel evaluation metric for the task. |
20 | Privacy Preserving Structure-from-Motion | Marcel Geppert; Viktor Larsson; Pablo Speciale; Johannes L. Schönberger; Marc Pollefeys; | In this paper, we further build upon this idea and propose solutions to the different core algorithms of an incremental Structure-from-Motion pipeline based on random line features. |
21 | Rewriting a Deep Generative Model | David Bau; Steven Liu; Tongzhou Wang; Jun-Yan Zhu; Antonio Torralba; | In this paper, we introduce a new problem setting: manipulation of specific rules encoded by a deep generative model. |
22 | Compare and Reweight: Distinctive Image Captioning Using Similar Images Sets | Jiuniu Wang; Wenjia Xu; Qingzhong Wang; Antoni B. Chan; | In this paper, we aim to improve the distinctiveness of image captions through training with sets of similar images. |
23 | Long-term Human Motion Prediction with Scene Context | Zhe Cao; Hang Gao; Karttikeya Mangalam; Qi-Zhi Cai; Minh Vo; Jitendra Malik; | In this work, we propose a novel three-stage framework that exploits scene context to tackle this task. |
24 | NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis | Ben Mildenhall; Pratul P. Srinivasan; Matthew Tancik; Jonathan T. Barron; Ravi Ramamoorthi; Ren Ng; | We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views. |
25 | ReferIt3D: Neural Listeners for Fine-Grained 3D Object Identification in Real-World Scenes | Panos Achlioptas; Ahmed Abdelreheem; Fei Xia; Mohamed Elhoseiny; Leonidas Guibas; | In this work we study the problem of using referential language to identify common objects in real-world 3D scenes. |
26 | MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images | Benjamin Attal; Selena Ling; Aaron Gokaslan; Christian Richardt; James Tompkin; | We introduce a method to convert stereo 360 (omnidirectional stereo) imagery into a layered, multi-sphere image representation for six degree-of-freedom (6DoF) rendering. |
27 | Learning and Aggregating Deep Local Descriptors for Instance-level Recognition | Giorgos Tolias; Tomas Jenicek; Ond?ej Chum; | We propose an efficient method to learn deep local descriptors for instance-level recognition. |
28 | A Consistently Fast and Globally Optimal Solution to the Perspective-n-Point Problem | George Terzakis; Manolis Lourakis; | An approach for estimating the pose of a camera given a set of 3D points and their corresponding 2D image projections is presented. |
29 | Learn to Recover Visible Color for Video Surveillance in a Day | Guangming Wu; Yinqiang Zheng; Zhiling Guo; Zekun Cai; Xiaodan Shi; Xin Ding; Yifei Huang; Yimin Guo; Ryosuke Shibasaki; | In this paper, we present a deep learning based approach that directly generates human-friendly, visible color for video surveillance in a day. |
30 | Deep Fashion3D: A Dataset and Benchmark for 3D Garment Reconstruction from Single Images | Heming Zhu; Yu Cao; Hang Jin; Weikai Chen; Dong Du; Zhangye Wang; Shuguang Cui; Xiaoguang Han; | We propose to fill this gap by introducing DeepFashion3D, the largest collection to date of 3D garment models, with the goal of establishing a novel benchmark and dataset for the evaluation of image-based garment reconstruction systems. |
31 | Spatially Adaptive Inference with Stochastic Feature Sampling and Interpolation | Zhenda Xie; Zheng Zhang; Xizhou Zhu; Gao Huang; Stephen Lin; | Towards reducing this superfluous computation, we propose to compute features only at sparsely sampled locations, which are probabilistically chosen according to activation responses, and then densely reconstruct the feature map with an efficient interpolation procedure. |
32 | BorderDet: Border Feature for Dense Object Detection | Han Qiu; Yuchen Ma; Zeming Li; Songtao Liu; Jian Sun; | In this paper, We propose a simple and efficient operator called Border-Align to extract “border features” from the extreme point of the border to enhance the point feature. |
33 | Regularization with Latent Space Virtual Adversarial Training | Genki Osada; Budrul Ahsan; Revoti Prasad Bora; Takashi Nishide; | To address this problem we propose LVAT (Latent space VAT), which injects perturbation in the latent space instead of the input space. |
34 | Du²Net: Learning Depth Estimation from Dual-Cameras and Dual-Pixels | Yinda Zhang; Neal Wadhwa; Sergio Orts-Escolano; Christian Häne; Sean Fanello; Rahul Garg; | We present a novel approach based on neural networks for depth estimation that combines stereo from dual cameras with stereo from a dual-pixel sensor, which is increasingly common on consumer cameras. |
35 | Model-Agnostic Boundary-Adversarial Sampling for Test-Time Generalization in Few-Shot learning | Jaekyeom Kim; Hyoungseok Kim; Gunhee Kim; | We propose a model-agnostic method that improves the test-time performance of any few-shot learning models with no additional training, and thus is free from the training-test domain gap. |
36 | Targeted Attack for Deep Hashing based Retrieval | Jiawang Bai; Bin Chen; Yiming Li; Dongxian Wu; Weiwei Guo; Shu-Tao Xia; En-Hui Yang; | In this paper, we propose a novel method, dubbed deep hashing targeted attack (DHTA), to study the targeted attack on such retrieval. |
37 | Gradient Centralization: A New Optimization Technique for Deep Neural Networks | Hongwei Yong; Jianqiang Huang; Xiansheng Hua; Lei Zhang; | Different from those previous methods that mostly operate on activations or weights, we present a new optimization technique, namely gradient centralization (GC), which operates directly on gradients by centralizing the gradient vectors to have zero mean. |
38 | Content-Aware Unsupervised Deep Homography Estimation | Jirong Zhang; Chuan Wang; Shuaicheng Liu; Lanpeng Jia; Nianjin Ye; Jue Wang; Ji Zhou; Jian Sun; | To overcome these problems, in this work we propose an unsupervised deep homography method with a new architecture design. |
39 | Multi-View Optimization of Local Feature Geometry | Mihai Dusmanu; Johannes L. Schönberger; Marc Pollefeys; | In this work, we address the problem of refining the geometry of local image features from multiple views without known scene or camera geometry. |
40 | The Phong Surface: Efficient 3D Model Fitting using Lifted Optimization | Jingjing Shen; Thomas J. Cashman; Qi Ye; Tim Hutton; Toby Sharp; Federica Bogo; Andrew Fitzgibbon; Jamie Shotton; | To solve model-fitting problems for HoloLens 2 hand tracking, where the computational budget is approximately 100 times smaller than an iPhone 7, we introduce a new surface model: the `Phong surface’. |
41 | Forecasting Human-Object Interaction: Joint Prediction of Motor Attention and Actions in First Person Video | Miao Liu; Siyu Tang; Yin Li; James M. Rehg; | Motivated by this observation, we adopt intentional hand movement as a feature representation, and propose a novel deep network that jointly models and predicts the egocentric hand motion, interaction hotspots and future action. |
42 | Learning Stereo from Single Images | Jamie Watson; Oisin Mac Aodha; Daniyar Turmukhambetov; Gabriel J. Brostow; Michael Firman; | We propose that it is unnecessary to have such a high reliance on ground truth depths or even corresponding stereo pairs. |
43 | Prototype Rectification for Few-Shot Learning | Jinlu Liu; Liang Song; Yongqiang Qin; | In this paper, we figure out two key influencing factors of the process: the intra-class bias and the cross-class bias. We then propose a simple yet effective approach for prototype rectification in transductive setting. |
44 | Learning Feature Descriptors using Camera Pose Supervision | Qianqian Wang; Xiaowei Zhou; Bharath Hariharan; Noah Snavely; | In this paper we propose a novel weakly-supervised framework that can learn feature descriptors solely from relative camera poses between images. |
45 | Semantic Flow for Fast and Accurate Scene Parsing | Xiangtai Li; Ansheng You; Zhen Zhu; Houlong Zhao; Maoke Yang; Kuiyuan Yang; Shaohua Tan; Yunhai Tong; | In this paper, we focus on designing effective method for fast and accurate scene parsing. |
46 | Appearance Consensus Driven Self-Supervised Human Mesh Recovery | Jogendra Nath Kundu; Mugalodi Rakesh; Varun Jampani; Rahul Mysore Venkatesh; R. Venkatesh Babu; | We present a self-supervised human mesh recovery framework to infer human pose and shape from monocular images in the absence of any paired supervision. |
47 | Diffraction Line Imaging | Mark Sheinin; Dinesh N. Reddy; Matthew O’Toole; Srinivasa G. Narasimhan; | We present a novel computational imaging principle that combines diffractive optics with line (1D) sensing. |
48 | Aligning and Projecting Images to Class-conditional Generative Networks | Minyoung Huh; Richard Zhang; Jun-Yan Zhu; Sylvain Paris; Aaron Hertzmann; | We present a method for projecting an input image into the space of a class-conditional generative neural network. |
49 | Suppress and Balance: A Simple Gated Network for Salient Object Detection | Xiaoqi Zhao; Youwei Pang; Lihe Zhang; Huchuan Lu; Lei Zhang; | In this work, we propose a simple gated network (GateNet) to solve both issues at once. |
50 | Visual Memorability for Robotic Interestingness via Unsupervised Online Learning | Chen Wang; Wenshan Wang; Yuheng Qiu; Yafei Hu; Sebastian Scherer; | In this paper, we explore the problem of interesting scene prediction for mobile robots. |
51 | Post-Training Piecewise Linear Quantization for Deep Neural Networks | Jun Fang; Ali Shafiee; Hamzah Abdel-Aziz; David Thorsley; Georgios Georgiadis; Joseph H. Hassoun; | In this paper, we propose a PieceWise Linear Quantization (PWLQ) scheme to enable accurate approximation for tensor values that have bell-shaped distributions with long tails. |
52 | Joint Disentangling and Adaptation for Cross-Domain Person Re-Identification | Yang Zou; Xiaodong Yang; Zhiding Yu; B.V.K. Vijaya Kumar; Jan Kautz; | In this paper, we seek to improve adaptation by purifying the representation space to be adapted. |
53 | In-Home Daily-Life Captioning Using Radio Signals | Lijie Fan; Tianhong Li; Yuan Yuan; Dina Katabi; | We introduce RF-Diary, a new model for captioning daily life by analyzing the privacy-preserving radio signal in the home with the home’s floormap. |
54 | Self-Challenging Improves Cross-Domain Generalization | Zeyi Huang; Haohan Wang; Eric P. Xing; Dong Huang; | We introduce a simple training heuristic, Representation Self-Challenging (RSC), that significantly improves the generalization of CNN to the out-of-domain data. |
55 | A Competence-aware Curriculum for Visual Concepts Learning via Question Answering | Qing Li; Siyuan Huang; Yining Hong; Song-Chun Zhu; | To mimic this efficient learning ability, we propose a competence-aware curriculum for visual concept learning in a question-answering manner. |
56 | Multitask Learning Strengthens Adversarial Robustness | Chengzhi Mao; Amogh Gupta; Vikram Nitin; Baishakhi Ray; Shuran Song ; Junfeng Yang; Carl Vondrick; | We present both theoretical and empirical analyses that connect the adversarial robustness of a model to the number of tasks that it is trained on. |
57 | S2DNAS: Transforming Static CNN Model for Dynamic Inference via Neural Architecture Search | Zhihang Yuan; Bingzhe Wu; Guangyu Sun; Zheng Liang; Shiwan Zhao; Weichen Bi; | In this paper, we introduce a general framework, S2DNAS, which can transform various static CNN models to support dynamic inference via neural architecture search. |
58 | Improving Deep Video Compression by Resolution-adaptive Flow Coding | Zhihao Hu; Zhenghao Chen; Dong Xu; Guo Lu; Wanli Ouyang; Shuhang Gu; | In this work, we propose a new framework called Resolution-adaptive Flow Coding (RaFC) to effectively compress the flow maps globally and locally, in which we use multi-resolution representations instead of single-resolution representations for both the input flow maps and the output motion features of the MV encoder. |
59 | Motion Capture from Internet Videos | Junting Dong; Qing Shuai; Yuanqing Zhang; Xian Liu; Xiaowei Zhou; Hujun Bao; | To address these challenges, we propose a novel optimization-based framework and experimentally demonstrate its ability to recover much more precise and detailed motion from multiple videos, compared against monocular motion capture methods. |
60 | Appearance-Preserving 3D Convolution for Video-based Person Re-identification | Xinqian Gu; Hong Chang; Bingpeng Ma; Hongkai Zhang; Xilin Chen; | To address this problem, we propose Appearance-Preserving 3D Convolution (AP3D), which is composed of two components: an Appearance-Preserving Module (APM) and a 3D convolution kernel. |
61 | Solving the Blind Perspective-n-Point Problem End-To-End With Robust Differentiable Geometric Optimization | Dylan Campbell; Liu Liu; Stephen Gould; | We instead propose the first fully end-to-end trainable network for solving the blind PnP problem efficiently and globally, that is, without the need for pose priors. |
62 | Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation | Xingang Pan; Xiaohang Zhan; Bo Dai; Dahua Lin; Chen Change Loy; Ping Luo; | This work presents an effective way to exploit the image prior captured by a generative adversarial network (GAN) trained on large-scale natural images. |
63 | Deep Spatial-angular Regularization for Compressive Light Field Reconstruction over Coded Apertures | Mantang Guo; Junhui Hou; Jing Jin; Jie Chen; Lap-Pui Chau; | To tackle this challenge, we propose a novel learning-based framework for the reconstruction of high-quality LFs from acquisitions via learned coded apertures. |
64 | Video-based Remote Physiological Measurement via Cross-verified Feature Disentangling | Xuesong Niu; Zitong Yu; Hu Han; Xiaobai Li; Shiguang Shan; Guoying Zhao; | To address these challenges, we propose a cross-verified feature disentangling strategy to disentangle the physiological features with non-physiological representations such as head movements and lighting conditions, and then use the distilled physiological features for robust multi-task physiological measurements. |
65 | Combining Implicit Function Learning and Parametric Models for 3D Human Reconstruction | Bharat Lal Bhatnagar; Cristian Sminchisescu; Christian Theobalt; Gerard Pons-Moll; | Given sparse 3D point clouds sampled on the surface of a dressed person, we use an Implicit Part Network (IP-Net) to jointly predict the outer 3D surface of the dressed person, the inner body surface, and the semantic correspondences to a parametric body model. |
66 | Orientation-aware Vehicle Re-identification with Semantics-guided Part Attention Network | Tsai-Shien Chen; Chih-Ting Liu; Chih-Wei Wu; Shao-Yi Chien; | In this work, we propose a dedicated Semantics-guided Part Attention Network (SPAN) to robustly predict part attention masks for different views of vehicles given only image-level semantic labels during training. |
67 | Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation | Guolei Sun; Wenguan Wang; Jifeng Dai; Luc Van Gool; | This paper studies the problem of learning semantic segmentation from image-level supervision only. |
68 | CoReNet: Coherent 3D Scene Reconstruction from a Single RGB Image | Stefan Popov; Pablo Bauszat; Vittorio Ferrari; | Building on common encoder-decoder architectures for this task, we propose three extensions: (1) ray-traced skip connections that propagate local 2D information to the output 3D volume in a physically correct manner (2) a hybrid 3D volume representation that enables building translation equivariant models, while at the same time encoding fine object details without an excessive memory footprint (3) a reconstruction loss tailored to capture overall object geometry. |
69 | Layer-wise Conditioning Analysis in Exploring the Learning Dynamics of DNNs | Lei Huang; Jie Qin; Li Liu; Fan Zhu; Ling Shao; | To this end, we propose layer-wise conditioning analysis, which explores the optimization landscape with respect to each layer independently. |
70 | RAFT: Recurrent All-Pairs Field Transforms for Optical Flow | Zachary Teed; Jia Deng; | We introduce Recurrent All-Pairs Field Transforms (RAFT), a new deep network architecture for estimating optical flow. |
71 | Domain-invariant Stereo Matching Networks | Feihu Zhang; Xiaojuan Qi; Ruigang Yang; Victor Prisacariu; Benjamin Wah; Philip Torr; | In this paper, we aim at designing a domain-invariant stereo matching network (DSMNet) that generalizes well to unseen scenes. |
72 | DeepHandMesh: A Weakly-supervised Deep Encoder-Decoder Framework for High-fidelity Hand Mesh Modeling | Gyeongsik Moon; Takaaki Shiratori; Kyoung Mu Lee; | In this study, we firstly propose DeepHandMesh, a weakly-supervised deep encoder-decoder framework for high-fidelity hand mesh modeling. |
73 | Content Adaptive and Error Propagation Aware Deep Video Compression | Guo Lu; Chunlei Cai; Xiaoyun Zhang; Li Chen; Wanli Ouyang; Dong Xu ; Zhiyong Gao; | To address these two problems, we propose a content adaptive and error propagation aware video compression system. |
74 | Towards Streaming Perception | Mengtian Li; Yu-Xiong Wang; Deva Ramanan; | To these ends, we present an approach that coherently integrates latency and accuracy into a single metric for real-time online perception, which we refer to as ""streaming accuracy"". |
75 | Towards Automated Testing and Robustification by Semantic Adversarial Data Generation | Rakshith Shetty; Mario Fritz; Bernt Schiele; | In this work we propose semantic adversarial editing,a method to synthesize plausible but difficult data points on which our target model breaks down. |
76 | Adversarial Generative Grammars for Human Activity Prediction | AJ Piergiovanni; Anelia Angelova; Alexander Toshev; Michael S. Ryoo; | In this paper we propose an adversarial generative grammar model for future prediction. |
77 | GDumb: A Simple Approach that Questions Our Progress in Continual Learning | Ameya Prabhu; Philip H. S. Torr; Puneet K. Dokania; | To validate this, we propose GDumb that (1) greedily stores samples in memory as they come and (2) at test time, trains a model from scratch using samples only in the memory. |
78 | Learning Lane Graph Representations for Motion Forecasting | Ming Liang; Bin Yang; Rui Hu; Yun Chen; Renjie Liao; Song Feng; Raquel Urtasun; | We propose a motion forecasting model that exploits a novel structured map representation as well as actor-map interactions. |
79 | What Matters in Unsupervised Optical Flow | Rico Jonschkowski; Austin Stone; Jonathan T. Barron; Ariel Gordon; Kurt Konolige; Anelia Angelova; | By combining the results of our investigation with our improved model components, we are able to present a new unsupervised flow technique that significantly outperforms the previous unsupervised state-of-the-art and performs on par with supervised FlowNet2 on the KITTI 2015 dataset, while also being significantly simpler than related approaches. |
80 | Synthesis and Completion of Facades from Satellite Imagery | Xiaowei Zhang; Christopher May; Daniel Aliaga; | We present a machine learning-based inverse procedural modeling method to automatically create synthetic facades from satellite imagery. |
81 | Mapillary Planet-Scale Depth Dataset | Manuel López Antequera; Pau Gargallo; Markus Hofinger; Samuel Rota Bulò Yubin Kuang; Peter Kontschieder; | We introduce a new depth dataset that is an order of magnitude larger than previous datasets, but more importantly, contains an unprecedented gamut of locations, camera models and scene types while offering metric depth (not just up-to-scale). |
82 | V2VNet: Vehicle-to-Vehicle Communication for Joint Perception and Prediction | Tsun-Hsuan Wang; Sivabalan Manivasagam; Ming Liang; Bin Yang; Wenyuan Zeng; Raquel Urtasun; | In this paper, we explore the use of vehicle-to-vehicle (V2V) communication to improve the perception and motion forecasting performance of self-driving vehicles. |
83 | Training Interpretable Convolutional Neural Networks by Differentiating Class-specific Filters | Haoyu Liang; Zhihao Ouyang; Yuyuan Zeng; Hang Su; Zihao He; Shu-Tao Xia; Jun Zhu; Bo Zhang; | Inspired by cellular differentiation, we propose a novel strategy to train interpretable CNNs by encouraging class-specific filters, among which each filter responds to only one (or few) class. |
84 | EagleEye: Fast Sub-net Evaluation for Efficient Neural Network Pruning | Bailin Li; Bowen Wu; Jiang Su; Guangrun Wang; | In this work, we present a pruning method called EagleEye, in which a simple yet efficient evaluation component based on adaptive batch normalization is applied to unveil a strong correlation between different pruned DNN structures and their final settled accuracy. |
85 | Intrinsic Point Cloud Interpolation via Dual Latent Space Navigation | Marie-Julie Rakotosaona; Maks Ovsjanikov; | We present a learning-based method for interpolating and manipulating 3D shapes represented as point clouds, that is explicitly designed to preserve intrinsic shape properties. |
86 | Cross-Domain Cascaded Deep Translation | Oren Katzir; Dani Lischinski; Daniel Cohen-Or; | We mitigate this by descending the deep layers of a pre-trained network, where the deep features contain more semantics, and applying the translation between these deep feature. |
87 | “Look Ma, no landmarks!” – Unsupervised, Model-based Dense Face Alignment | Tatsuro Koizumi; William A. P. Smith; | In this paper, we show how to train an image-to-image network to predict dense correspondence between a face image and a 3D morphable model using only the model for supervision. |
88 | Online Invariance Selection for Local Feature Descriptors | Rémi Pautrat; Viktor Larsson; Martin R. Oswald; Marc Pollefeys; | We propose to overcome this limitation with a disentanglement of invariance in local descriptors and with an online selection of the most appropriate invariance given the context. |
89 | Rethinking Image Inpainting via a Mutual Encoder-Decoder with Feature Equalizations | Hongyu Liu; Bin Jiang; Yibing Song; Wei Huang; Chao Yang; | In this paper, we propose a mutual encoder-decoder CNN for joint recovery of both. |
90 | TextCaps: a Dataset for Image Captioning with Reading Comprehension | Oleksii Sidorov; Ronghang Hu; Marcus Rohrbach; Amanpreet Singh; | To study how to comprehend text in the context of an image we collect a novel dataset, TextCaps, with 145k captions for 28k images. |
91 | It is not the Journey but the Destination: Endpoint Conditioned Trajectory Prediction | Karttikeya Mangalam; Harshayu Girase; Shreyas Agarwal; Kuan-Hui Lee; Ehsan Adeli; Jitendra Malik; Adrien Gaidon; | In this work, we present Predicted Endpoint Conditioned Network (PECNet) for flexible human trajectory prediction. |
92 | Learning What to Learn for Video Object Segmentation | Goutam Bhat; Felix Järemo Lawin; Martin Danelljan; Andreas Robinson; Michael Felsberg; Luc Van Gool; Radu Timofte; | We address this by introducing an end-to-end trainable VOS architecture that integrates a differentiable few-shot learner. |
93 | SIZER: A Dataset and Model for Parsing 3D Clothing and Learning Size Sensitive 3D Clothing | Garvita Tiwari; Bharat Lal Bhatnagar; Tony Tung; Gerard Pons-Moll; | In this paper, we introduce SizerNet to predict 3D clothing conditionedon human body shape and garment size parameters, and ParserNet toinfer garment meshes and shape under clothing with personal details in asingle pass from an input mesh. |
94 | LIMP: Learning Latent Shape Representations with Metric Preservation Priors | Luca Cosmo; Antonio Norelli; Oshri Halimi; Ron Kimmel; Emanuele Rodolà | In this paper, we advocate the adoption of metric preservation as a powerful prior for learning latent representations of deformable 3D shapes. |
95 | Unsupervised Sketch to Photo Synthesis | Runtao Liu; Qian Yu; Stella X. Yu; | We study unsupervised sketch to photo synthesis for the first time, learning from unpaired sketch and photo data where the target photo for a sketch is unknown during training. |
96 | A Simple Way to Make Neural Networks Robust Against Diverse Image Corruptions | Evgenia Rusak; Lukas Schott; Roland S. Zimmermann; Julian Bitterwolf ; Oliver Bringmann; Matthias Bethge; Wieland Brendel; | Here, we demonstrate that a simple but properly tuned training with additive Gaussian and Speckle noise generalizes surprisingly well to unseen corruptions, easily reaching the state of the art on the corruption benchmark ImageNet-C (with ResNet50) and on MNIST-C. |
97 | SoftPoolNet: Shape Descriptor for Point Cloud Completion and Classification | Yida Wang; David Joseph Tan; Nassir Navab; Federico Tombari; | In this paper, we propose a method for 3D object completion and classification based on point clouds. |
98 | Hierarchical Face Aging through Disentangled Latent Characteristics | Peipei Li; Huaibo Huang; Yibo Hu; Xiang Wu; Ran He; Zhenan Sun; | To explore the age effects on facial images, we propose a Disentangled Adversarial Autoencoder (DAAE) to disentangle the facial images into three independent factors: age, identity and extraneous information. |
99 | Hybrid Models for Open Set Recognition | Hongjie Zhang; Ang Li; Jie Guo; Yanwen Guo; | We propose the OpenHybrid framework, which is composed of an encoder to encode the input data into a joint embedding space, a classifier to classify samples to inlier classes, and a flow-based density estimator to detect whether a sample belongs to the unknown category. |
100 | TopoGAN: A Topology-Aware Generative Adversarial Network | Fan Wang; Huidong Liu; Dimitris Samaras; Chao Chen; | In this paper, we propose a novel GAN model that learns the topology of real images, i.e., connectedness and loopy-ness. |
101 | Learning to Localize Actions from Moments | Fuchen Long; Ting Yao; Zhaofan Qiu; Xinmei Tian; Jiebo Luo; Tao Mei; | In this paper, we introduce a new design of transfer learning type to learn action localization for a large set of action categories, but only on action moments from the categories of interest and temporal annotations of untrimmed videos from a small set of action classes. |
102 | ForkGAN: Seeing into the Rainy Night | Ziqiang Zheng; Yang Wu; Xinran Han; Jianbo Shi; | We present a ForkGAN for task-agnostic image translation that can boost multiple vision tasks in adverse weather conditions. |
103 | TCGM: An Information-Theoretic Framework for Semi-Supervised Multi-Modality Learning | Xinwei Sun; Yilun Xu; Peng Cao; Yuqing Kong; Lingjing Hu; Shanghang Zhang; Yizhou Wang; | In this paper, we propose a novel information-theoretic approach \– namely, extbf{T}otal extbf{C}orrelation extbf{G}ain extbf{M}aximization (TCGM) \— for semi-supervised multi-modal learning, which is endowed with promising properties: (i) it can utilize effectively the information across different modalities of unlabeled data points to facilitate training classifiers of each modality (ii) has theoretical guarantee to have theoretical guarantee to identify Bayesian classifiers, i.e., the ground truth posteriors of all modalities. |
104 | ExchNet: A Unified Hashing Network for Large-Scale Fine-Grained Image Retrieval | Quan Cui; Qing-Yuan Jiang; Xiu-Shen Wei; Wu-Jun Li; Osamu Yoshie; | In this paper, we study the novel fine-grained hashing topic to generate compact binary codes for fine-grained images, leveraging the search and storage efficiency of hash learning to alleviate the aforementioned problems. |
105 | TSIT: A Simple and Versatile Framework for Image-to-Image Translation | Liming Jiang; Changxu Zhang; Mingyang Huang; Chunxiao Liu; Jianping Shi; Chen Change Loy; | We introduce a simple and versatile framework for image-to-image translation. |
106 | ProxyBNN: Learning Binarized Neural Networks via Proxy Matrices | Xiangyu He; Zitao Mo; Ke Cheng; Weixiang Xu; Qinghao Hu; Peisong Wang; Qingshan Liu; Jian Cheng; | In this paper, by introducing an appropriate proxy matrix, we reduce the weights quantization error while circumventing explicit binary regularizations on the full-precision auxiliary variables. |
107 | HMOR: Hierarchical Multi-Person Ordinal Relations for Monocular Multi-Person 3D Pose Estimation | Can Wang; Jiefeng Li; Wentao Liu; Chen Qian; Cewu Lu; | In this paper, we attempt to address the lack of a global perspective of the top-down approaches by introducing a novel form of supervision – Hierarchical Multi-person Ordinal Relations (HMOR). |
108 | Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve | Weicheng Kuo; Anelia Angelova; Tsung-Yi Lin; Angela Dai; | We present Mask2CAD, which jointly detects objects in real-world images and for each detected object, optimizes for the most similar CAD model and its pose. |
109 | A Unified Framework of Surrogate Loss by Refactoring and Interpolation | Lanlan Liu; Mingzhe Wang; Jia Deng; | We introduce UniLoss, a unified framework to generate surrogate losses for training deep networks with gradient descent, reducing the amount of manual design of task-specific surrogate losses. |
110 | Deep Reflectance Volumes: Relightable Reconstructions from Multi-View Photometric Images | Sai Bi; Zexiang Xu; Kalyan Sunkavalli; Miloš Hašan; Yannick Hold-Geoffroy; David Kriegman; Ravi Ramamoorthi; | We present a deep learning approach to reconstruct scene appearance from unstructured images captured under collocated point lighting. |
111 | Memory-augmented Dense Predictive Coding for Video Representation Learning | Tengda Han; Weidi Xie; Andrew Zisserman; | The objective of this paper is self-supervised learning from video, in particular for representations for action recognition. |
112 | PointMixup: Augmentation for Point Clouds | Yunlu Chen; Vincent Tao Hu; Efstratios Gavves; Thomas Mensink; Pascal Mettes; Pengwan Yang; Cees G. M. Snoek; | In this paper, we define data augmentation between point clouds as a shortest path linear interpolation. |
113 | Identity-Guided Human Semantic Parsing for Person Re-Identification | Kuan Zhu; Haiyun Guo; Zhiwei Liu; Ming Tang; Jinqiao Wang; | In this paper, we propose the identity-guided human semantic parsing approach (ISP) to locate both the human body parts and personal belongings at pixel-level for aligned person re-ID only with person identity labels. |
114 | Learning Gradient Fields for Shape Generation | Ruojin Cai; Guandao Yang; Hadar Averbuch-Elor; Zekun Hao; Serge Belongie; Noah Snavely; Bharath Hariharan; | In this work, we propose a novel technique to generate shapes from point cloud data. |
115 | COCO-FUNIT: Few-Shot Unsupervised Image Translation with a Content Conditioned Style Encoder | Kuniaki Saito; Kate Saenko; Ming-Yu Liu; | To address the issue, we propose a new few-shot image translation model, COCO-FUNIT, which computes the style embedding of the example images conditioned on the input image and a new module called the constant style bias. |
116 | Corner Proposal Network for Anchor-free, Two-stage Object Detection | Kaiwen Duan; Lingxi Xie; Honggang Qi; Song Bai; Qingming Huang; Qi Tian; | This paper proposes a novel anchor-free, two-stage framework which first extracts a number of object proposals by finding potential corner keypoint combinations and then assigns a class label to each proposal by a standalone classification stage. |
117 | PhraseClick: Toward Achieving Flexible Interactive Segmentation by Phrase and Click | Henghui Ding; Scott Cohen; Brian Price; Xudong Jiang; | We propose to employ phrase expressions as another interaction input to infer the attributes of target object. |
118 | Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing | Yapeng Tian; Dingzeyu Li; Chenliang Xu; | In this paper, we introduce a new problem, named audio-visual video parsing, which aims to parse a video into temporal event segments and label them as either audible, visible, or both. |
119 | Learning Delicate Local Representations for Multi-Person Pose Estimation | Yuanhao Cai; Zhicheng Wang; Zhengxiong Luo; Binyi Yin; Angang Du; Haoqian Wang; Xiangyu Zhang; Xinyu Zhou; Erjin Zhou; Jian Sun; | In this paper, we propose a novel method called Residual Steps Network (RSN). |
120 | Learning to Plan with Uncertain Topological Maps | Edward Beeching; Jilles Dibangoye; Olivier Simonin; Christian Wolf; | Our main contribution is a data driven learning based approach for planning under uncertainty in topological maps, requiring an estimate of shortest paths in valued graphs with a probabilistic structure. |
121 | Neural Design Network: Graphic Layout Generation with Constraints | Hsin-Ying Lee; Lu Jiang; Irfan Essa; Phuong B Le; Haifeng Gong; Ming-Hsuan Yang; Weilong Yang; | We propose a method for design layout generation that can satisfy user-specified constraints. |
122 | Learning Open Set Network with Discriminative Reciprocal Points | Guangyao Chen; Limeng Qiao; Yemin Shi; Peixi Peng; Jia Li; Tiejun Huang; Shiliang Pu; Yonghong Tian; | In this paper, we propose a new concept, Reciprocal Point, which is the potential representation of the extra-class space corresponding to each known category. |
123 | Convolutional Occupancy Networks | Songyou Peng; Michael Niemeyer; Lars Mescheder; Marc Pollefeys; Andreas Geiger; | In this paper, we propose Convolutional Occupancy Networks, a more flexible implicit representation for detailed reconstruction of objects and 3D scenes. |
124 | Multi-person 3D Pose Estimation in Crowded Scenes Based on Multi-View Geometry | He Chen; Pengfei Guo; Pengfei Li; Gim Hee Lee; Gregory Chirikjian; | In this paper, we depart from the multi-person 3D pose estimation formulation, and instead reformulate it as crowd pose estimation. |
125 | TIDE: A General Toolbox for Identifying Object Detection Errors | Daniel Bolya; Sean Foley; James Hays; Judy Hoffman; | We introduce TIDE, a framework and associated toolbox for analyzing the sources of error in object detection and instance segmentation algorithms. |
126 | PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding | Saining Xie; Jiatao Gu; Demi Guo; Charles R. Qi; Leonidas Guibas; Or Litany; | In this work, we aim at facilitating research on 3D representation learning. |
127 | DSA: More Efficient Budgeted Pruning via Differentiable Sparsity Allocation | Xuefei Ning; Tianchen Zhao; Wenshuo Li; Peng Lei; Yu Wang; Huazhong Yang; | In this paper, we propose Differentiable Sparsity Allocation (DSA), an efficient end-to-end budgeted pruning flow. |
128 | Circumventing Outliers of AutoAugment with Knowledge Distillation | Longhui Wei; An Xiao; Lingxi Xie; Xiaopeng Zhang; Xin Chen; Qi Tian; | This paper delves deep into the working mechanism, and reveals that AutoAugment may remove part of discriminative information from the training image and so insisting on the ground-truth label is no longer the best option. |
129 | S2DNet: Learning Image Features for Accurate Sparse-to-Dense Matching | Hugo Germain; Guillaume Bourmaud; Vincent Lepetit; | In this paper, we introduce S2DNet, a novel feature matching pipeline, designed and trained to efficiently establish both robust and accurate correspondences. |
130 | RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving | Peixuan Li; Huaici Zhao; Pengfei Liu; Feidao Cao; | In this work, we propose an efficient and accurate monocular 3D detection framework in single shot. |
131 | Video Object Segmentation with Episodic Graph Memory Networks | Xiankai Lu; Wenguan Wang; Martin Danelljan; Tianfei Zhou; Jianbing Shen; Luc Van Gool; | In this work, a graph memory network is developed to address the novel idea of “learning to update the segmentation model”. |
132 | Rethinking Bottleneck Structure for Efficient Mobile Network Design | Daquan Zhou; Qibin Hou; Yunpeng Chen; Jiashi Feng; Shuicheng Yan; | In this paper, we rethink the necessity of such design change and find it may bring risks of information loss and gradient confusion. |
133 | Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks | Jeffrey O. Zhang; Alexander Sax; Amir Zamir; Leonidas Guibas; Jitendra Malik; | The most commonly employed approaches for network adaptation are fine-tuning and using the pre-trained network as a fixed feature extractor, among others. In this paper, we propose a straightforward alternative:side-tuning. |
134 | Towards Part-aware Monocular 3D Human Pose Estimation: An Architecture Search Approach | Zerui Chen; Yan Huang; Hongyuan Yu; Bin Xue; Ke Han; Yiru Guo; Liang Wang; | To accurately estimate 3D poses of different body parts, we attempt to build a part-aware 3D pose estimator by searching a set of network architectures. |
135 | REVISE: A Tool for Measuring and Mitigating Bias in Visual Datasets | Angelina Wang; Arvind Narayanan; Olga Russakovsky; | Overall, the key aim of our work is to tackle the machine learning bias problem early in the pipeline. |
136 | Contrastive Learning for Weakly Supervised Phrase Grounding | Tanmay Gupta; Arash Vahdat; Gal Chechik; Xiaodong Yang; Jan Kautz; Derek Hoiem; | We show that phrase grounding can be learned by optimizing word-region attention to maximize a lower bound on mutual information between images and caption words. |
137 | Collaborative Learning of Gesture Recognition and 3D Hand Pose Estimation with Multi-Order Feature Analysis | Siyuan Yang; Jun Liu; Shijian Lu; Meng Hwa Er; Alex C. Kot; | In this paper, we present a novel collaborative learning network for joint gesture recognition and 3D hand pose estimation. |
138 | Making an Invisibility Cloak: Real World Adversarial Attacks on Object Detectors | Zuxuan Wu; Ser-Nam Lim; Larry S. Davis; Tom Goldstein; | We present a systematic study of adversarial attacks on state-of-the-art object detection frameworks. |
139 | TuiGAN: Learning Versatile Image-to-Image Translation with Two Unpaired Images | Jianxin Lin; Yingxue Pang; Yingce Xia; Zhibo Chen; Jiebo Luo; | In this paper, we argue that even if each domain contains a single image, UI2I can still be achieved. |
140 | Semi-Siamese Training for Shallow Face Learning | Hang Du; Hailin Shi; Yuchi Liu; Jun Wang; Zhen Lei; Dan Zeng; Tao Mei; | In this paper, we aim to address the problem by introducing a novel training method named Semi-Siamese Training (SST). |
141 | GAN Slimming: All-in-One GAN Compression by A Unified Optimization Framework | Haotao Wang; Shupeng Gui; Haichuan Yang; Ji Liu; Zhangyang Wang; | To this end, we propose the first end-to-end optimization framework combining multiple compression means for GAN compression, dubbed GAN Slimming (GS). |
142 | Human Interaction Learning on 3D Skeleton Point Clouds for Video Violence Recognition | Yukun Su; Guosheng Lin; Jinhui Zhu; Qingyao Wu; | This paper introduces a new method for recognizing violent behavior by learning contextual relationships between related people from human skeleton points. |
143 | Binarized Neural Network for Single Image Super Resolution | Jingwei Xin; Nannan Wang; Xinrui Jiang; Jie Li; Heng Huang; Xinbo Gao; | We propose a simple but effective binary neural networks (BNN) based SISR model with a novel binarization scheme. |
144 | Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation | Huiyu Wang; Yukun Zhu; Bradley Green; Hartwig Adam; Alan Yuille; Liang-Chieh Chen; | In this paper, we attempt to remove this constraint by factorizing 2D self-attention into two 1D self-attentions. |
145 | Adaptive Computationally Efficient Network for Monocular 3D Hand Pose Estimation | Zhipeng Fan; Jun Liu; Yao Wang; | In this paper, we investigate the problem of reducing the overall computation cost yet maintaining the high accuracy for 3D hand pose estimation from video sequences. |
146 | Chained-Tracker: Chaining Paired Attentive Regression Results for End-to-End Joint Multiple-Object Detection and Tracking | Jinlong Peng; Changan Wang; Fangbin Wan; Yang Wu; Yabiao Wang; Ying Tai; Chengjie Wang; Jilin Li; Feiyue Huang; Yanwei Fu; | Going beyond these sub-optimal frameworks, we propose a simple online model named Chained-Tracker (CTracker), which naturally integrates all the three subtasks into an end-to-end solution (the first as far as we know). |
147 | Distribution-Balanced Loss for Multi-Label Classification in Long-Tailed Datasets | Tong Wu; Qingqiu Huang; Ziwei Liu; Yu Wang; Dahua Lin; | We present a new loss function called Distribution-Balanced Loss for the multi-label recognition problems that exhibit long-tailed class distributions. |
148 | Hamiltonian Dynamics for Real-World Shape Interpolation | Marvin Eisenberger; Daniel Cremers; | We revisit the classical problem of 3D shape interpolation and propose a novel, physically plausible approach based on Hamiltonian dynamics. |
149 | Learning to Scale Multilingual Representations for Vision-Language Tasks | Andrea Burns; Donghyun Kim; Derry Wijaya; Kate Saenko; Bryan A. Plummer; | In this paper, we propose a Scalable Multilingual Aligned Language Representation (SMALR) that supports many languages with few model parameters without sacrificing downstream task performance. |
150 | Multi-modal Transformer for Video Retrieval | Valentin Gabeur; Chen Sun; Karteek Alahari; Cordelia Schmid; | In this paper, we present a multi-modal transformer to jointly encode the different modalities in video, which allows each of them to attend to the others. |
151 | Feature Representation Matters: End-to-End Learning for Reference-based Image Super-resolution | Yanchun Xie; Jimin Xiao; Mingjie Sun; Chao Yao; Kaizhu Huang; | In this paper, we are aiming for a general reference-based super-resolution setting: it does not require the low-resolution image and the high-resolution reference image to be well aligned or with a similar texture. |
152 | RobustFusion: Human Volumetric Capture with Data-driven Visual Cues using a RGBD Camera | Zhuo Su; Lan Xu; Zerong Zheng; Tao Yu; Yebin Liu; Lu Fang; | In this paper, inspired by the huge potential of learning-based human modeling, we propose RobustFusion, a robust human performance capture system combined with various data-driven visual cues using a single RGBD camera. |
153 | Surface Normal Estimation of Tilted Images via Spatial Rectifier | Tien Do; Khiem Vuong; Stergios I. Roumeliotis; Hyun Soo Park; | In this paper, we present a spatial rectifier to estimate surface normals of tilted images. |
154 | Multimodal Shape Completion via Conditional Generative Adversarial Networks | Rundi Wu; Xuelin Chen; Yixin Zhuang; Baoquan Chen; | Hence, we pose a multi-modal shape completion problem, in which we seek to complete the partial shape with multiple outputs by learning a one-to-many mapping. |
155 | Generative Sparse Detection Networks for 3D Single-shot Object Detection | JunYoung Gwak; Christopher Choy; Silvio Savarese; | To this end, we propose Generative Sparse Detection Network (GSDN), a fully-convolutional single-shot sparse detection network that efficiently generates the support for object proposals. |
156 | Grounded Situation Recognition | Sarah Pratt; Mark Yatskar; Luca Weihs; Ali Farhadi; Aniruddha Kembhavi; | We introduce Grounded Situation Recognition (GSR), a task that requires producing structured semantic summaries of images describing: the primary activity, entities engaged in the activity with their roles (e.g. agent, tool), and bounding-box groundings of entities. |
157 | Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos | Shaoxiang Chen; Wenhao Jiang; Wei Liu; Yu-Gang Jiang; | Inspired by the fact that there exist cross-modal interactions in the human brain, we propose a novel method for learning pairwise modality interactions in order to better exploit complementary information for each pair of modalities in videos and thus improve performances on both tasks. |
158 | Unpaired Learning of Deep Image Denoising | Xiaohe Wu; Ming Liu; Yue Cao; Dongwei Ren; Wangmeng Zuo; | We investigate the task of learning blind image denoising networks from an unpaired set of clean and noisy images. |
159 | Self-supervising Fine-grained Region Similarities for Large-scale Image Localization | Yixiao Ge; Haibo Wang; Feng Zhu; Rui Zhao; Hongsheng Li; | To tackle this challenge, we propose to self-supervise image-to-region similarities in order to fully explore the potential of difficult positive images alongside their sub-regions. |
160 | Rotationally-Temporally Consistent Novel View Synthesis of Human Performance Video | Youngjoong Kwon; Stefano Petrangeli; Dahun Kim; Haoliang Wang; Eunbyung Park; Viswanathan Swaminathan; Henry Fuchs; | To tackle these challenges, we introduce a human-specific framework that employs a learned 3D-aware representation. |
161 | Side-Aware Boundary Localization for More Precise Object Detection | Jiaqi Wang; Wenwei Zhang; Yuhang Cao; Kai Chen; Jiangmiao Pang; Tao Gong; Jianping Shi; Chen Change Loy; Dahua Lin; | In this paper, we propose an alternative approach, named as Side-Aware Boundary Localization (SABL), where each side of the bounding box is respectively localized with a dedicated network branch. |
162 | SF-Net: Single-Frame Supervision for Temporal Action Localization | Fan Ma; Linchao Zhu; Yi Yang; Shengxin Zha; Gourab Kundu; Matt Feiszli; Zheng Shou; | In this paper, we study an intermediate form of supervision, i.e., single-frame supervision, for temporal action localization (TAL). |
163 | Negative Margin Matters: Understanding Margin in Few-shot Classification | Bin Liu; Yue Cao; Yutong Lin; Qi Li; Zheng Zhang; Mingsheng Long; Han Hu; | In this paper, we unconventionally propose to adopt appropriate negative-margin to softmax loss for few-shot classification, which surprisingly works well for the open-set scenarios of few-shot classification. |
164 | Particularity beyond Commonality: Unpaired Identity Transfer with Multiple References | Ruizheng Wu; Xin Tao; Yingcong Chen; Xiaoyong Shen; Jiaya Jia; | We accordingly propose a new multi-reference identity transfer framework by simultaneously making use of particularity and commonality of reference. |
165 | Tracking Objects as Points | Xingyi Zhou; Vladlen Koltun; Philipp Krähenbühl; | In this paper, we present a simultaneous detection and tracking algorithm that is simpler, faster, and more accurate than the state of the art. |
166 | CPGAN: Content-Parsing Generative Adversarial Networks for Text-to-Image Synthesis | Jiadong Liang; Wenjie Pei; Feng Lu; | In this paper we circumvent this problem by focusing on parsing the content of both the input text and the synthesized image thoroughly to model the text-to-image consistency in the semantic level. |
167 | Transporting Labels via Hierarchical Optimal Transport for Semi-Supervised Learning | Fariborz Taherkhani; Ali Dabouei; Sobhan Soleymani; Jeremy Dawson; Nasser M. Nasrabadi; | In this work, we consider the general setting of the SSL problem for image classification,where the labeled and unlabeled data come from the same underlying distribution. |
168 | MTI-Net: Multi-Scale Task Interaction Networks for Multi-Task Learning | Simon Vandenhende; Stamatios Georgoulis; Luc Van Gool; | In this paper, we argue about the importance of considering task interactions at multiple scales when distilling task information in a multi-task learning setup. |
169 | Learning to Factorize and Relight a City | Andrew Liu; Shiry Ginosar; Tinghui Zhou; Alexei A. Efros; Noah Snavely; | We propose a learning-based framework for disentangling outdoor scenes into temporally-varying illumination and permanent scene factors. |
170 | Region Graph Embedding Network for Zero-Shot Learning | Guo-Sen Xie; Li Liu; Fan Zhu; Fang Zhao; Zheng Zhang; Yazhou Yao; Jie Qin; Ling Shao; | In this paper, to model the relations among local image regions, we incorporate the region-based relation reasoning into ZSL. |
171 | GRAB: A Dataset of Whole-Body Human Grasping of Objects | Omid Taheri; Nima Ghorbani; Michael J. Black; Dimitrios Tzionas; | Thus, we collect a new dataset, called GRAB (GRasping Actions with Bodies), of whole-body grasps, containing full 3D shape and pose sequences of 10 subjects interacting with 51 everyday objects of varying shape and size. |
172 | DEMEA: Deep Mesh Autoencoders for Non-Rigidly Deforming Objects | Edgar Tretschk; Ayush Tewari; Michael Zollhöfer; Vladislav Golyanik; Christian Theobalt; | We propose a general-purpose DEep MEsh Autoencoder \hbox{(DEMEA)} which adds a novel embedded deformation layer to a graph-convolutional mesh autoencoder. |
173 | RANSAC-Flow: Generic Two-stage Image Alignment | Xi Shen; François Darmon; Alexei A. Efros; Mathieu Aubry; | We propose a two-stage process: first, a feature-based parametric coarse alignment using one or more homographies, followed by non-parametric fine pixel-wise alignment. |
174 | Semantic Object Prediction and Spatial Sound Super-Resolution with Binaural Sounds | Arun Balajee Vasudevan; Dengxin Dai; Luc Van Gool; | We propose a novel sensor setup and record a new audio-visual dataset of street scenes with eight professional binaural microphones and a $360^{ |
175 | Neural Object Learning for 6D Pose Estimation Using a Few Cluttered Images | Kiru Park; Timothy Patten; Markus Vincze; | This paper proposes a method, Neural Object Learning (NOL), that creates synthetic images of objects in arbitrary poses by combining only a few observations from cluttered images. |
176 | Dense Hybrid Recurrent Multi-view Stereo Net with Dynamic Consistency Checking | Jianfeng Yan; Zizhuang Wei; Hongwei Yi; Mingyu Ding; Runze Zhang; Yisong Chen; Guoping Wang; Yu-Wing Tai; | In this paper, we propose an efficient and effective dense hybrid recurrent multi-view stereo net with dynamic consistency checking, namely $D^{2}$HC-RMVSNet, for accurate dense point cloud reconstruction. |
177 | Pixel-Pair Occlusion Relationship Map (P2ORM): Formulation, Inference & Application | Xuchong Qiu; Yang Xiao; Chaohui Wang; Renaud Marlet; | The former provides a way to generate large-scale accurate occlusion datasets while, based on the latter, we propose a novel method for task-independent pixel-level occlusion relationship estimation from single images. |
178 | MovieNet: A Holistic Dataset for Movie Understanding | Qingqiu Huang; Yu Xiong; Anyi Rao; Jiaze Wang; Dahua Lin; | In this paper, we introduce MovieNet — a holistic dataset for movie understanding. |
179 | Short-Term and Long-Term Context Aggregation Network for Video Inpainting | Ang Li; Shanshan Zhao; Xingjun Ma; Mingming Gong; Jianzhong Qi; Rui Zhang; Dacheng Tao; Ramamohanarao Kotagiri; | In this work, we present a novel context aggregation network to effectively exploit both short-term and long-term frame information for video inpainting. |
180 | DH3D: Deep Hierarchical 3D Descriptors for Robust Large-Scale 6DoF Relocalization | Juan Du; Rui Wang; Daniel Cremers; | For relocalization in large-scale point clouds, we propose the first approach that unifies global place recognition and local 6DoF pose refinement. |
181 | Face Super-Resolution Guided by 3D Facial Priors | Xiaobin Hu; Wenqi Ren; John LaMaster; Xiaochun Cao; Xiaoming Li; Zechao Li; Bjoern Menze; Wei Liu; | In this paper, we propose a novel face super-resolution method that explicitly incorporates 3D facial priors which grasp the sharp facial structures. |
182 | Label Propagation with Augmented Anchors: A Simple Semi-Supervised Learning baseline for Unsupervised Domain Adaptation | Yabin Zhang; Bin Deng; Kui Jia; Lei Zhang; | In this work, we take a step further to study the proper extensions of SSL techniques for UDA. |
183 | Are Labels Necessary for Neural Architecture Search? | Chenxi Liu; Piotr Dollár; Kaiming He; Ross Girshick; Alan Yuille; Saining Xie; | In this paper, we ask the question: can we find high-quality neural architectures using only images, but no human-annotated labels? |
184 | BLSM: A Bone-Level Skinned Model of the Human Mesh | Haoyang Wang; Riza Alp Güler; Iasonas Kokkinos; George Papandreou; Stefanos Zafeiriou; | We introduce BLSM, a bone-level skinned model of the human body mesh where bone scales are set prior to template synthesis, rather than the common, inverse practice. |
185 | Associative Alignment for Few-shot Image Classification | Arman Afrasiyabi; Jean-François Lalonde; Christian Gagné | This paper proposes the idea of associative alignment for leveraging part of the base data by aligning the novel training instances to the closely related ones in the base training set. |
186 | Cyclic Functional Mapping: Self-supervised Correspondence between Non-isometric Deformable Shapes | Dvir Ginzburg; Dan Raviv; | We present the first utterly self-supervised network for dense correspondence mapping between non-isometric shapes. |
187 | View-Invariant Probabilistic Embedding for Human Pose | Jennifer J. Sun; Jiaping Zhao; Liang-Chieh Chen; Florian Schroff; Hartwig Adam; Ting Liu; | In this paper, we propose an approach for learning a compact view-invariant embedding space from 2D joint keypoints alone, without explicitly predicting 3D poses. |
188 | Contact and Human Dynamics from Monocular Video | Davis Rempe; Leonidas J. Guibas; Aaron Hertzmann; Bryan Russell; Ruben Villegas; Jimei Yang; | In this paper, we present a physics-based method for inferring 3D human motion from video sequences that takes initial 2D and 3D pose estimates as input. |
189 | PointPWC-Net: Cost Volume on Point Clouds for (Self-)Supervised Scene Flow Estimation | Wenxuan Wu; Zhi Yuan Wang; Zhuwen Li; Wei Liu; Li Fuxin; | We propose a novel end-to-end deep scene flow model, called PointPWC-Net, that directly processes 3D point cloud scenes with large motions in a coarse-to-fine fashion. |
190 | Points2Surf Learning Implicit Surfaces from Point Clouds | Philipp Erler; Paul Guerrero; Stefan Ohrhallinger; Niloy J. Mitra; Michael Wimmer; | We present Points2Surf, a novel patch-based learning framework that produces accurate surfaces directly from raw scans without normals. |
191 | Few-Shot Scene-Adaptive Anomaly Detection | Yiwei Lu; Frank Yu; Mahesh Kumar Krishna Reddy; Yang Wang; | In this paper, we propose a novel few-shot scene-adaptive anomaly detection problem to address the limitations of previous approaches. |
192 | Personalized Face Modeling for Improved Face Reconstruction and Motion Retargeting | Bindita Chaudhuri; Noranart Vesdapunt; Linda Shapiro; Baoyuan Wang; | We propose an end-to-end framework that jointly learns a personalized face model per user and per-frame facial motion parameters from a large corpus of in-the-wild videos of user expressions. |
193 | Entropy Minimisation Framework for Event-based Vision Model Estimation | Urbano Miguel Nunes; Yiannis Demiris; | We propose a novel EMin framework for event-based vision model estimation. |
194 | Reconstructing NBA Players | Luyang Zhu; Konstantinos Rematas; Brian Curless; Steven M. Seitz; Ira Kemelmacher-Shlizerman; | Based on these models, we introduce a new method that takes as input a single photo of a clothed player performing any basketball pose and outputs a high resolution mesh and pose of that player. |
195 | PIoU Loss: Towards Accurate Oriented Object Detection in Complex Environments | Zhiming Chen; Kean Chen; Weiyao Lin; John See; Hui Yu; Yan Ke; Cong Yang; | Therefore, a novel loss, Pixels-IoU (PIoU) Loss, is formulated to exploit both the angle and IoU for accurate OBB regression. |
196 | TENet: Triple Excitation Network for Video Salient Object Detection | Sucheng Ren; Chu Han; Xin Yang; Guoqiang Han; Shengfeng He; | In this paper, we propose a simple yet effective approach, named Triple Excitation Network, to reinforce the training of video salient object detection (VSOD) from three aspects, spatial, temporal, and online excitations. |
197 | Deep Feedback Inverse Problem Solver | Wei-Chiu Ma; Shenlong Wang; Jiayuan Gu; Sivabalan Manivasagam; Antonio Torralba; Raquel Urtasun; | We present an efficient, effective, and generic approach towards solving inverse problems. |
198 | Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification | Liuyu Xiang; Guiguang Ding; Jungong Han; | In this paper, we propose a novel self-paced knowledge distillation framework, termed Learning From Multiple Experts (LFME). |
199 | Hallucinating Visual Instances in Total Absentia | Jiayan Qiu; Yiding Yang; Xinchao Wang; Dacheng Tao; | In this paper, we investigate a new visual restoration task, termed as hallucinating visual instances in total absentia (HVITA). |
200 | Weakly-supervised 3D Shape Completion in the Wild | Jiayuan Gu; Wei-Chiu Ma; Sivabalan Manivasagam; Wenyuan Zeng; Zihao Wang; Yuwen Xiong; Hao Su; Raquel Urtasun; | To this end, we propose a weakly-supervised method to estimate both 3D canonical shape and 6-DoF pose for alignment, given multiple partial observations associated with the same instance |
201 | DTVNet: Dynamic Time-lapse Video Generation via Single Still Image | Jiangning Zhang; Chao Xu; Liang Liu; Mengmeng Wang; Xia Wu; Yong Liu; Yunliang Jiang; | This paper presents a novel end-to-end dynamic time-lapse video generation framework, named DTVNet, to generate diversified time-lapse videos from a single landscape image, which are conditioned on normalized motion vectors. |
202 | CLIFFNet for Monocular Depth Estimation with Hierarchical Embedding Loss | Lijun Wang; Jianming Zhang; Yifan Wang; Huchuan Lu; Xiang Ruan; | This paper proposes a hierarchical loss for monocular depth estimation, which measures the differences between the prediction and ground truth in hierarchical embedding spaces of depth maps. |
203 | Collaborative Video Object Segmentation by Foreground-Background Integration | Zongxin Yang; Yunchao Wei; Yi Yang; | This paper investigates the principles of embedding learning to tackle the challenging semi-supervised video object segmentation. |
204 | Adaptive Margin Diversity Regularizer for handling Data Imbalance in Zero-Shot SBIR | Titir Dutta; Anurag Singh; Soma Biswas; | Since most real-world training data have a fair amount of imbalance in this work, for the first time in literature, we extensively study the effect of training data imbalance on the generalization to unseen categories, with ZS-SBIR as the application area. |
205 | ETH-XGaze: A Large Scale Dataset for Gaze Estimation under Extreme Head Pose and Gaze Variation | Xucong Zhang; Seonwook Park; Thabo Beeler; Derek Bradley; Siyu Tang ; Otmar Hilliges; | In this paper, we propose a new gaze estimation dataset called ETH-XGaze, consisting of over one million high-resolution images of varying gaze under extreme head poses. |
206 | Calibration-free Structure-from-Motion with Calibrated Radial Trifocal Tensors | Viktor Larsson; Nicolas Zobernig; Kasim Taskin; Marc Pollefeys; | In this paper we consider the problem of Structure-from-Motion from images with unknown intrinsic calibration. |
207 | Occupancy Anticipation for Efficient Exploration and Navigation | Santhosh K. Ramakrishnan; Ziad Al-Halah; Kristen Grauman; | We propose occupancy anticipation, where the agent uses its egocentric RGB-D observations to infer the occupancy state beyond the visible regions. |
208 | Unified Image and Video Saliency Modeling | Richard Droste; Jianbo Jiao; J. Alison Noble; | To address this we propose four novel domain adaptation techniques – Domain-Adaptive Priors, Domain-Adaptive Fusion, Domain-Adaptive Smoothing and Bypass-RNN – in addition to an improved formulation of learned Gaussian priors. |
209 | TAO: A Large-Scale Benchmark for Tracking Any Object | Achal Dave; Tarasha Khurana; Pavel Tokmakov; Cordelia Schmid; Deva Ramanan; | To bridge this gap, we introduce a similarly diverse dataset for Tracking Any Object (TAO). |
210 | A Generalization of Otsu’s Method and Minimum Error Thresholding | Jonathan T. Barron; | We present Generalized Histogram Thresholding (GHT), a simple, fast, and effective technique for histogram-based image thresholding. |
211 | A Cordial Sync: Going Beyond Marginal Policies for Multi-Agent Embodied Tasks | Unnat Jain; Luca Weihs; Eric Kolve; Ali Farhadi; Svetlana Lazebnik; Aniruddha Kembhavi; Alexander Schwing; | Addressing this, we introduce the novel task FurnMove in which agents work together to move a piece of furniture through a living room to a goal. |
212 | Big Transfer (BiT): General Visual Representation Learning | Alexander Kolesnikov; Lucas Beyer; Xiaohua Zhai; Joan Puigcerver; Jessica Yung; Sylvain Gelly; Neil Houlsby; | We scale up pre-training, and propose a simple recipe that we call Big Transfer (BiT). |
213 | VisualCOMET: Reasoning about the Dynamic Context of a Still Image | Jae Sung Park; Chandra Bhagavatula; Roozbeh Mottaghi; Ali Farhadi; Yejin Choi; | We propose Visual COMET, the novel framework of visual common-sense reasoning tasks to predict events that might have happened before, events that might happen next, and the intents of the people at present. |
214 | Few-shot Action Recognition with Permutation-invariant Attention | Hongguang Zhang; Li Zhang; Xiaojuan Qi; Hongdong Li; Philip H. S. Torr; Piotr Koniusz; | Many few-shot learning models focus on recognising images. In contrast, we tackle a challenging task of few-shot action recognition from videos. |
215 | Character Grounding and Re-Identification in Story of Videos and Text Descriptions | Youngjae Yu; Jongseok Kim; Heeseung Yun; Jiwan Chung; Gunhee Kim; | In order to solve these related tasks in a mutually rewarding way, we propose a model named Character in Story Identification Network (CiSIN). |
216 | AABO: Adaptive Anchor Box Optimization for Object Detection via Bayesian Sub-sampling | Wenshuo Ma; Tingzhong Tian; Hang Xu; Yimin Huang; Zhenguo Li; | In this paper, we study the problem of automatically optimizing anchor boxes for object detection. |
217 | Learning Visual Context by Comparison | Minchul Kim; Jongchan Park; Seil Na; Chang Min Park; Donggeun Yoo; | In this paper, we present Attend-and-Compare Module (ACM) for capturing the difference between an object of interest and its corresponding context. |
218 | Large Scale Holistic Video Understanding | Ali Diba; Mohsen Fayyaz; Vivek Sharma; Manohar Paluri; Jürgen Gall; Rainer Stiefelhagen; Luc Van Gool; | We fill this gap by presenting a large-scale “Holistic Video Understanding Dataset” (HVU). |
219 | Indirect Local Attacks for Context-aware Semantic Segmentation Networks | Krishna Kanth Nakka; Mathieu Salzmann; | To this end, we introduce an indirect attack strategy, namely adaptive local attacks, aiming to find the best image location to perturb, while preserving the labels at this location and producing a realistic-looking segmentation map. |
220 | Predicting Visual Overlap of Images Through Interpretable Non-Metric Box Embeddings | Anita Rau; Guillermo Garcia-Hernando; Danail Stoyanov; Gabriel J. Brostow; Daniyar Turmukhambetov; | While we don’t obviate the need for geometric verification, we propose an interpretable image-embedding that cuts the search in scale space to essentially a lookup. |
221 | Connecting Vision and Language with Localized Narratives | Jordi Pont-Tuset; Jasper Uijlings; Soravit Changpinyo; Radu Soricut; Vittorio Ferrari; | We propose Localized Narratives, a new form of multimodal image annotations connecting vision and language. |
222 | Adversarial T-shirt! Evading Person Detectors in A Physical World | Kaidi Xu; Gaoyuan Zhang; Sijia Liu; Quanfu Fan; Mengshu Sun; Hongge Chen; Pin-Yu Chen; Yanzhi Wang; Xue Lin; | In this work, we proposed adversarial T-shirts, a robust physical adversarial example for evading person detectors even if it could undergo non-rigid deformation due to a moving person’s pose changes. |
223 | Bounding-box Channels for Visual Relationship Detection | Sho Inayoshi; Keita Otani; Antonio Tejero-de-Pablos; Tatsuya Harada; | In this paper, we propose the bounding-box channels, a novel architecture capable of relating the semantic, spatial, and image features strongly. |
224 | Minimal Rolling Shutter Absolute Pose with Unknown Focal Length and Radial Distortion | Zuzana Kukelova; Cenek Albl; Akihiro Sugimoto; Konrad Schindler; Tomas Pajdla; | We present the first minimal solutions for the absolute pose of a rolling shutter camera with unknown rolling shutter parameters, focal length, and radial distortion. |
225 | SRFlow: Learning the Super-Resolution Space with Normalizing Flow | Andreas Lugmayr; Martin Danelljan; Luc Van Gool; Radu Timofte; | In this work, we therefore propose SRFlow: a normalizing flow based super-resolution method capable of learning the conditional distribution of the output given the low-resolution input. |
226 | DeepGMR: Learning Latent Gaussian Mixture Models for Registration | Wentao Yuan; Benjamin Eckart; Kihwan Kim; Varun Jampani; Dieter Fox ; Jan Kautz; | In this paper, we introduce Deep Gaussian Mixture Registration (DeepGMR), the first learning-based registration method that explicitly leverages a probabilistic registration paradigm by formulating registration as the minimization of KL-divergence between two probability distributions modeled as mixtures of Gaussians. |
227 | Active Perception using Light Curtains for Autonomous Driving | Siddharth Ancha; Yaadhav Raaj; Peiyun Hu; Srinivasa G. Narasimhan; David Held; | In this work, we propose a method for 3D object recognition using light curtains, a resource-efficient active sensor that measures depth at selected locations in the environment in a controllable manner. |
228 | Invertible Neural BRDF for Object Inverse Rendering | Zhe Chen; Shohei Nobuhara; Ko Nishino; | We introduce a novel neural network-based BRDF model and a Bayesian framework for object inverse rendering, i.e., joint estimation of reflectance and natural illumination from a single image of an object of known geometry. |
229 | Semi-supervised Semantic Segmentation via Strong-weak Dual-branch Network | Wenfeng Luo; Meng Yang; | To fully explore the potential of the weak labels, we propose to impose separate treatments of strong and weak annotations via a strong-weakdual-branch network, which discriminates the massive inaccurate weak supervisions from those strong ones. |
230 | Practical Deep Raw Image Denoising on Mobile Devices | Yuzhi Wang; Haibin Huang; Qin Xu; Jiaming Liu; Yiqun Liu; Jue Wang; | In this work, we propose a light-weight, efficient neural network-based raw image denoiser that runs smoothly on mainstream mobile devices, and produces high quality denoising results. |
231 | SoundSpaces: Audio-Visual Navigation in 3D Environments | Changan Chen; Unnat Jain; Carl Schissler; Sebastia Vicenc Amengual Gari; Ziad Al-Halah; Vamsi Krishna Ithapu; Philip Robinson; and Kristen Grauman; | We introduce audio-visual navigation for complex, acoustically and visually realistic 3D environments. |
232 | Two-Stream Consensus Network for Weakly-Supervised Temporal Action Localization | Yuanhao Zhai; Le Wang; Wei Tang; Qilin Zhang; Junsong Yuan; Gang Hua; | In this paper, we present a Two-Stream Consensus Network (TSCN) to simultaneously address these challenges. |
233 | Erasing Appearance Preservation in Optimization-based Smoothing | Lvmin Zhang; Chengze Li; Yi JI; Chunping Liu; Tien-tsin Wong; | In this paper, we call this manipulation as Erasing Appearance Preservation (EAP). |
234 | Counterfactual Vision-and-Language Navigation via Adversarial Path Sampler | Tsu-Jui Fu; Xin Eric Wang; Matthew F. Peterson,Scott T. Grafton; Miguel P. Eckstein; William Yang Wang; | We propose an adversarial-driven counterfactual reasoning model that can consider effective conditions instead of low-quality augmented data. |
235 | Guided Deep Decoder: Unsupervised Image Pair Fusion | Tatsumi Uezato; Danfeng Hong; Naoto Yokoya; Wei He; | To address this limitation, in this study, we propose a guided deep decoder network as a general prior. |
236 | Filter Style Transfer between Photos | Jonghwa Yim; Jisung Yoo; Won-joon Do; Beomsu Kim; Jihwan Choe; | In this paper, we introduce a new concept of style transfer, Filter Style Transfer (FST). |
237 | JGR-P2O: Joint Graph Reasoning based Pixel-to-Offset Prediction Network for 3D Hand Pose Estimation from a Single Depth Image | Linpu Fang; Xingyan Liu; Li Liu; Hang Xu; Wenxiong Kang; | In this paper, a novel pixel-wise prediction-based method is proposed to address the above issues. |
238 | Dynamic Group Convolution for Accelerating Convolutional Neural Networks | Zhuo Su; Linpu Fang; Wenxiong Kang; Dewen Hu; Matti Pietikäinen; Li Liu; | In this paper, we propose dynamic group convolution (DGC) that adaptively selects which part of input channels to be connected within each group for individual samples on the fly. |
239 | RD-GAN: Few/Zero-Shot Chinese Character Style Transfer via Radical Decomposition and Rendering | Yaoxiong Huang; Mengchao He; Lianwen Jin; Yongpan Wang; | In this paper, a novel radical decomposition-and-rendering-based GAN(RD-GAN) is proposed to utilize the radical-level compositions of Chinese characters and achieves few-shot/zero-shot Chinese character style transfer. |
240 | Object-Contextual Representations for Semantic Segmentation | Yuhui Yuan; Xilin Chen; Jingdong Wang; | In this paper, we address the semantic segmentation problem with a focus on the context aggregation strategy. |
241 | Efficient Spatio-Temporal Recurrent Neural Network for Video Deblurring | Zhihang Zhong; Ye Gao; Yinqiang Zheng; Bo Zheng; | To improve the network efficiency, we adopt residual dense blocks into RNN cells, so as to efficiently extract the spatial features of the current frame. |
242 | Joint Semantic Instance Segmentation on Graphs with the Semantic Mutex Watershed | Steffen Wolf; Yuyan Li; Constantin Pape; Alberto Bailoni; Anna Kreshuk; Fred A. Hamprecht; | We propose a greedy algorithm for joint graph partitioning and labeling derived from the efficient Mutex Watershed partitioning algorithm. |
243 | Photon-Efficient 3D Imaging with A Non-Local Neural Network | Jiayong Peng; Zhiwei Xiong; Xin Huang; Zheng-Ping Li; Dong Liu; Feihu Xu; | In this paper, we first analyze the long-range correlations in both spatial and temporal dimensions of the measurements. Then we propose a non-local neural network for depth reconstruction by exploiting the long-range correlations. |
244 | GeLaTO: Generative Latent Textured Objects | Ricardo Martin-Brualla; Rohit Pandey; Sofien Bouaziz; Matthew Brown; Dan B Goldman; | Inspired by billboards and geometric proxies used in computer graphics, this paper proposes Generative Latent Textured Objects (GeLaTO), a compact representation that combines a set of coarse shape proxies defining low frequency geometry with learned neural textures, to encode both medium and fine scale geometry as well as view-dependent appearance. |
245 | Improving Vision-and-Language Navigation with Image-Text Pairs from the Web | Arjun Majumdar; Ayush Shrivastava; Stefan Lee; Peter Anderson; Devi Parikh; Dhruv Batra; | Specifically, we develop VLN-BERT, a visiolinguistic transformer-based model for scoring the compatibility between an instruction (‘…stop at the brown sofa’) and a trajectory of panoramic RGB images captured by the agent. |
246 | Directional Temporal Modeling for Action Recognition | Xinyu Li; Bing Shuai; Joseph Tighe; | In this paper, we introduce a channel independent directional convolution (CIDC) operation, which learns to model the temporal evolution among local features. |
247 | Shonan Rotation Averaging: Global Optimality by Surfing SO(p)(n) | Frank Dellaert; David M. Rosen; Jing Wu; Robert Mahony; Luca Carlone; | Our method employs semidefinite relaxation in order to recover provably globally optimal solutions of the rotation averaging problem. |
248 | Semantic Curiosity for Active Visual Learning | Devendra Singh Chaplot; Helen Jiang; Saurabh Gupta; Abhinav Gupta; | In this paper, we study the task of embodied interactive learning for object detection. |
249 | Multi-Temporal Recurrent Neural Networks For Progressive Non-Uniform Single Image Deblurring With Incremental Temporal Training | Dongwon Park; Dong Un Kang; Jisoo Kim; Se Young Chun; | To realize MT approach, we propose progressive deblurring over iterations and incremental temporal training with temporally augmented training data. |
250 | ProgressFace: Scale-Aware Progressive Learning for Face Detection | Jiashu Zhu; Dong Li; Tiantian Han; Lu Tian; Yi Shan; | In this work, we propose a novel scale-aware progressive training mechanism to address large scale variations across faces. |
251 | Learning Multi-layer Latent Variable Model via Variational Optimization of Short Run MCMC for Approximate Inference | Erik Nijkamp; Bo Pang; Tian Han; Linqi Zhou; Song-Chun Zhu; Ying Nian Wu; | In this paper, we propose to use noise initialized non-persistent short run MCMC, such as finite step Langevin dynamics initialized from the prior distribution of the latent variables, as an approximate inference engine, where the step size of the Langevin dynamics is variationally optimized by minimizing the Kullback-Leibler divergence between the distribution produced by the short run MCMC and the posterior distribution. |
252 | CoTeRe-Net: Discovering Collaborative Ternary Relations in Videos | Zhensheng Shi; Cheng Guan; Liangjie Cao; Qianqian Li; Ju Liang; Zhaorui Gu; Haiyong Zheng; Bing Zheng; | In this paper, we propose a novel relation model that discovers relations of both implicit and explicit cues as well as their collaboration in videos. |
253 | Modeling the Effects of Windshield Refraction for Camera Calibration | Frank Verbiest; Marc Proesmans; Luc Van Gool; | In this paper, we study the effects of windshield refraction for autonomous driving applications. |
254 | Unsupervised Domain Adaptation for Semantic Segmentation of NIR Images through Generative Latent Search | Prashant Pandey; Aayush Kumar Tyagi; Sameer Ambekar; Prathosh AP; | We propose a method for target-independent segmentation where the ‘nearest-clone’ of a target image in the source domain is searched and used as a proxy in the segmentation network trained only on the source domain. |
255 | PROFIT: A Novel Training Method for sub-4-bit MobileNet Models | Eunhyeok Park; Sungjoo Yoo; | In this work, we report that the activation instability induced by weight quantization (AIWQ) is the key obstacle to sub-4-bit quantization of mobile networks. |
256 | Visual Relation Grounding in Videos | Junbin Xiao; Xindi Shang; Xun Yang; Sheng Tang; Tat-Seng Chua; | In this paper, we explore a novel task named visual Relation Grounding in Videos (vRGV). |
257 | Weakly Supervised 3D Human Pose and Shape Reconstruction with Normalizing Flows | Andrei Zanfir; Eduard Gabriel Bazavan; Hongyi Xu; William T. Freeman; Rahul Sukthankar; Cristian Sminchisescu; | In this paper we present new priors as well as large-scale weakly supervised models for 3D human pose and shape estimation. |
258 | Controlling Style and Semantics in Weakly-Supervised Image Generation | Dario Pavllo; Aurelien Lucchi; Thomas Hofmann; | We propose a weakly-supervised approach for conditional image generation of complex scenes where a user has fine control over objects appearing in the scene. |
259 | Jointly learning visual motion and confidence from local patches in event cameras | Daniel R. Kepple; Daewon Lee; Colin Prepsius; Volkan Isler; Il Memming Park; Daniel D. Lee; | We propose the first network to jointly learn visual motion and confidence from events in spatially local patches. |
260 | SODA: Story Oriented Dense Video Captioning Evaluation Framework | Soichiro Fujita; Tsutomu Hirao; Hidetaka Kamigaito; Manabu Okumura; Masaaki Nagata; | This paper proposes a new evaluation framework, Story Oriented Dense video cAptioning evaluation framework (SODA), for measuring the performance of video story description systems. |
261 | Sketch-Guided Object Localization in Natural Images | Aditay Tripathi; Rajath R. Dani; Anand Mishra and Anirban Chakraborty; | We introduce a novel problem of localizing all the instances of an object (seen or unseen during training) in a natural image via sketch query. |
262 | A unifying mutual information view of metric learning: cross-entropy vs. pairwise losses | Malik Boudiaf; Jérôme Rony; Imtiaz Masud Ziko; Eric Granger; Marco Pedersoli; Pablo Piantanida; Ismail Ben Ayed; | However, we provide a theoretical analysis that links the cross-entropy to several well-known and recent pairwise losses. |
263 | Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models | Jize Cao; Zhe Gan; Yu Cheng; Licheng Yu; Yen-Chun Chen; Jingjing Liu; | To reveal the secrets behind the scene, we present VALUE (Vision-And-Language Understanding Evaluation), a set of meticulously designed probing tasks (e.g., Visual Coreference Resolution, Visual Relation Detection) generalizable to standard pre-trained V+L models, to decipher the inner workings of multimodal pre-training (e.g., implicit knowledge garnered in individual attention heads, inherent cross-modal alignment learned through contextualized multimodal embeddings). |
264 | The Hessian Penalty: A Weak Prior for Unsupervised Disentanglement | William Peebles; John Peebles; Jun-Yan Zhu; Alexei Efros; Antonio Torralba; | In this paper, we propose the Hessian Penalty, a simple regularization function that encourages the input Hessian of a function to be diagonal. |
265 | STAR: Sparse Trained Articulated Human Body Regressor | Ahmed A. A. Osman; Timo Bolkart; Michael J. Black; | To address this, we define per-joint pose correctives and learn the subset of mesh vertices that are influenced by each joint movement. This sparse formulation results in more realistic deformations and significantly reduces the number of model parameters to 20% of SMPL. |
266 | Optical Flow Distillation: Towards Efficient and Stable Video Style Transfer | Xinghao Chen; Yiman Zhang; Yunhe Wang; Han Shu; Chunjing Xu; Chang Xu; | This paper proposes to learn a lightweight video style transfer network via knowledge distillation paradigm. |
267 | Collaboration by Competition: Self-coordinated Knowledge Amalgamation for Multi-talent Student Learning | Sihui Luo; Wenwen Pan; Xinchao Wang; Dazhou Wang; Haihong Tang; Mingli Song; | In this paper, we study how to reuse such heterogeneous pre-trained models as teachers, and build a versatile and compact student model, without accessing human annotations. |
268 | Do Not Disturb Me: Person Re-identification Under the Interference of Other Pedestrians | Shizhen Zhao; Changxin Gao; Jun Zhang; Hao Cheng; Chuchu Han; Xinyang Jiang; Xiaowei Guo; Wei-Shi Zheng; Nong Sang; Xing Sun; | To address this problem, this paper presents a novel deep network termed Pedestrian-Interference Suppression Network (PISNet). |
269 | Learning 3D Part Assembly from a Single Image | Yichen Li; Kaichun Mo; Lin Shao; Minhyuk Sung; Leonidas Guibas; | Towards this end, we introduce a novel problem,single-image-guided 3D part assembly, along with a learning-based solution. |
270 | PT2PC: Learning to Generate 3D Point Cloud Shapes from Part Tree Conditions | Kaichun Mo; He Wang; Xinchen Yan; Leonidas Guibas; | In order to learn such a conditional shape generation procedure in an end-to-end fashion, we propose a conditional GAN “part tree”-to-“point cloud” model (PT2PC) that disentangles the structural and geometric factors. |
271 | Highly Efficient Salient Object Detection with 100K Parameters | Shang-Hua Gao; Yong-Qiang Tan; Ming-Ming Cheng; Chengze Lu; Yunpeng Chen; Shuicheng Yan; | In this paper, we aim to relieve the contradiction between computation cost and model performance by improving the network efficiency to a higher degree. |
272 | HardGAN: A Haze-Aware Representation Distillation GAN for Single Image Dehazing | Qili Deng; Ziling Huang; Chung-Chi Tsai; Chia-Wen Lin; | In this paper, we present a Haze-Aware Representation Distillation Generative Adversarial Network named HardGAN for single-image dehazing. |
273 | Lifespan Age Transformation Synthesis | Roy Or-El; Soumyadip Sengupta; Ohad Fried; Eli Shechtman; Ira Kemelmacher-Shlizerman; | We propose a new multi domain image-to-image generative adversarial network architecture, whose learned latent space accurately models the continuous aging process in both directions. |
274 | Domain2Vec: Domain Embedding for Unsupervised Domain Adaptation | Xingchao Peng; Yichen Li; Kate Saenko; | To describe and learn relations between different domains, we propose a novel Domain2Vec model to provide vectorial representations of visual domains based on joint learning of feature disentanglement and Gram matrix. |
275 | Simulating Content Consistent Vehicle Datasets with Attribute Descent | Yue Yao; Liang Zheng; Xiaodong Yang; Milind Naphade; Tom Gedeon; | We propose an attribute descent approach to let VehicleX approximate the attributes in real-world datasets. |
276 | Multiview Detection with Feature Perspective Transformation | Yunzhong Hou; Liang Zheng; Stephen Gould; | To address these questions, we introduce a novel multiview detector, MVDet. |
277 | Learning Object Relation Graph and Tentative Policy for Visual Navigation | Heming Du; Xin Yu; Liang Zheng; | Aiming to improve these two components, this paper proposes three complementary techniques, object relation graph (ORG),trial-driven imitation learning (IL), and a memory-augmented tentative policy network (TPN). |
278 | Adversarial Self-Supervised Learning for Semi-Supervised 3D Action Recognition | Chenyang Si; Xuecheng Nie; Wei Wang; Liang Wang; Tieniu Tan; Jiashi Feng; | To address these issues, we present Adversarial Self-Supervised Learning (ASSL), a novel framework that tightly couples SSL and the semi-supervised scheme via neighbor relation exploration and adversarial learning. |
279 | Across Scales & Across Dimensions: Temporal Super-Resolution using Deep Internal Learning | Liad Pollak Zuckerman; Eyal Naor; George Pisha; Shai Bagon; Michal Irani; | In this paper we propose a “Deep Internal Learning” approach for trueTSR. |
280 | Inducing Optimal Attribute Representations for Conditional GANs | Binod Bhattarai; Tae-Kyun Kim; | We propose a novel end-to-end learning framework based on Graph Convolutional Networks to learn the attribute representations to condition the generator. |
281 | AR-Net: Adaptive Frame Resolution for Efficient Action Recognition | Yue Meng; Chung-Ching Lin; Rameswar Panda; Prasanna Sattigeri; Leonid Karlinsky; Aude Oliva; Kate Saenko; Rogerio Feris; | In this paper, we propose a novel approach, called AR-Net (Adaptive Resolution Network), that selects on-the-fly the optimal resolution for each frame conditioned on the input for efficient action recognition in long untrimmed videos. |
282 | Image-to-Voxel Model Translation for 3D Scene Reconstruction and Segmentation | Vladimir V. Kniaz; Vladimir A. Knyaz; Fabio Remondino; Artem Bordodymov; Petr Moshkantsev; | We propose a single shot image-to-semantic voxel model translation framework. We collected a SemanticVoxels dataset with 116k images, ground-truth semantic voxel models, depth maps, and 6D object poses. |
283 | Consistency Guided Scene Flow Estimation | Yuhua Chen; Luc Van Gool; Cordelia Schmid; Cristian Sminchisescu; | The model takes two temporal stereo pairs as input, and predicts disparity and scene flow. |
284 | Autoregressive Unsupervised Image Segmentation | Yassine Ouali; Céline Hudelot; Myriam Tami; | In this work, we propose a new unsupervised image segmentation approach based on mutual information maximization between different constructed views of the inputs. |
285 | Controllable Image Synthesis via SegVAE | Yen-Chi Cheng; Hsin-Ying Lee; Min Sun; Ming-Hsuan Yang; | In this work, we specifically target at generating semantic maps given a label-set consisting of desired categories. |
286 | Off-Policy Reinforcement Learning for Efficient and Effective GAN Architecture Search | Yuan Tian; Qin Wang; Zhiwu Huang; Wen Li; Dengxin Dai; Minghao Yang ; Jun Wang; Olga Fink; | In this paper, we introduce a new reinforcement learning (RL) based neural architecture search (NAS) methodology for effective and efficient generative adversarial network (GAN) architecture search. |
287 | Efficient Non-Line-of-Sight Imaging from Transient Sinograms | Mariko Isogawa; Dorian Chan; Ye Yuan; Kris Kitani; Matthew O’Toole; | We propose a circular and confocal non-line-of-sight (C$^2$NLOS) scan that involves illuminating and imaging a common point, and scanning this point in a circular path along a wall. |
288 | Texture Hallucination for Large-Factor Painting Super-Resolution | Yulun Zhang; Zhifei Zhang; Stephen DiVerdi; Zhaowen Wang; Jose Echevarria; Yun Fu; | We aim to super-resolve digital paintings, synthesizing realistic details from high-resolution reference painting materials for very large scaling factors (g 8$ imes$, 16$ imes$). |
289 | Learning Progressive Joint Propagation for Human Motion Prediction | Yujun Cai; Lin Huang; Yiwei Wang; Tat-Jen Cham; Jianfei Cai; Junsong Yuan; Jun Liu; Xu Yang; Yiheng Zhu; Xiaohui Shen; Ding Liu; Jing Liu; Nadia Magnenat Thalmann; | In this paper, we address this problem in three aspects. First, to capture the long-range spatial correlations and temporal dependencies, we apply a transformer-based architecture with the global attention mechanism. |
290 | Image Stitching and Rectification for Hand-Held Cameras | Bingbing Zhuang; Quoc-Huy Tran; | In this paper, we derive a new differential homography that can account for the scanline-varying camera poses in Rolling Shutter (RS) cameras, and demonstrate its application to carry out RS-aware image stitching and rectification at one stroke. |
291 | ParSeNet: A Parametric Surface Fitting Network for 3D Point Clouds | Gopal Sharma; Difan Liu; Subhransu Maji; Evangelos Kalogerakis; Siddhartha Chaudhuri; Radomír M?ch; | We propose a novel, end-to-end trainable, deep network called ParSeNet |
292 | The Group Loss for Deep Metric Learning | Ismail Elezi; Sebastiano Vascon; Alessandro Torcinovich; Marcello Pelillo; Laura Leal-Taixé | We propose Group Loss,a loss function based on a differentiable label-propagation method that enforces embedding similarity across all samples of a group while promoting, at the same time, low-density regions amongst data points belonging to different groups. |
293 | Learning Object Depth from Camera Motion and Video Object Segmentation | Brent A. Griffin; Jason J. Corso; | To leverage this progress in 3D applications, this paper addresses the problem of learning to estimate the depth of segmented objects given some measurement of camera motion (e.g., from robot kinematics or vehicle odometry). |
294 | OnlineAugment: Online Data Augmentation with Less Domain Knowledge | Zhiqiang Tang; Yunhe Gao; Leonid Karlinsky; Prasanna Sattigeri; Rogerio Feris; Dimitris Metaxas; | In this work, we offer an orthogonal extit{online} data augmentation scheme together with three new augmentation networks, co-trained with the target learning task. |
295 | Learning Pairwise Inter-Plane Relations for Piecewise Planar Reconstruction | Yiming Qian; Yasutaka Furukawa; | This paper proposes a novel single-image piecewise planar reconstruction technique that infers and enforces inter-plane relationships. |
296 | Intra-class Feature Variation Distillation for Semantic Segmentation | Yukang Wang; Wei Zhou; Tao Jiang; Xiang Bai; Yongchao Xu; | In this paper, different from previous methods performing knowledge distillation for densely pairwise relations, we propose a novel intra-class feature variation distillation (IFVD) to transfer the intra-class feature variation (IFV) of the cumbersome model (teacher) to the compact model (student). |
297 | Temporal Distinct Representation Learning for Action Recognition | Junwu Weng; Donghao Luo; Yabiao Wang; Ying Tai; Chengjie Wang; Jilin Li; Feiyue Huang; Xudong Jiang; Junsong Yuan; | In this paper, we attempt to tackle this issue through two ways. 1) Design a sequential channel filtering mechanism, Progressive Enhancement Module (PEM), to excite the discriminative channels of features from different frames step by step, and thus avoid repeated information extraction. 2) Create a Temporal Diversity Loss (TD Loss) to force the kernels to concentrate on and capture the variations among frames rather than the image regions with similar appearance. |
298 | Representative Graph Neural Network | Changqian Yu; Yifan Liu; Changxin Gao; Chunhua Shen; Nong Sang; | In this paper, we present a Representative Graph (RepGraph) layer to dynamically sample a few representative features, which dramatically reduces redundancy. |
299 | Deformation-Aware 3D Model Embedding and Retrieval | Mikaela Angelina Uy; Jingwei Huang; Minhyuk Sung; Tolga Birdal; Leonidas Guibas; | We introduce a new problem of mph{retrieving} 3D models that are mph{deformable} to a given query shape and present a novel deep mph{deformation-aware} embedding to solve this retrieval task. |
300 | Atlas: End-to-End 3D Scene Reconstruction from Posed Images | Zak Murez; Tarrence van As; James Bartolozzi; Ayan Sinha; Vijay Badrinarayanan; Andrew Rabinovich; | We present an end-to-end 3D reconstruction of a scene by directly regressing a truncated signed distance function (TSDF) from a set of posed RGB images. |
301 | Multiple Class Novelty Detection Under Data Distribution Shift | Poojan Oza; Hien V. Nguyen; Vishal M. Patel; | To this end, we consider the problem of multiple class novelty detection under dataset distribution shift to improve the novelty detection performance. |
302 | Colorization of Depth Map via Disentanglement | Chung-Sheng Lai; Zunzhi You; Ching-Chun Huang; Yi-Hsuan Tsai; Wei-Chen Chiu; | In this paper, we propose a depth map colorization method via disentangling appearance and structure factors, so that our model could 1) learn depth-invariant appearance features from an appearance reference and 2) generate colorized images by combining a given depth map and the appearance feature obtained from any reference. |
303 | Beyond Controlled Environments: 3D Camera Re-Localization in Changing Indoor Scenes | Johanna Wald; Torsten Sattler; Stuart Golodetz; Tommaso Cavallari; Federico Tombari; | In this paper, we adapt 3RScan — a recently introduced indoor RGB-D dataset designed for object instance re-localization — to create RIO10, a new long-term camera re-localization benchmark focused on indoor scenes. |
304 | GeoGraph: Graph-based multi-view object detection with geometric cues end-to-end | Ahmed Samy Nassar; Stefano D’Aronco; Sébastien Lefèvre; Jan D. Wegner; | In this paper, we propose an end-to-end learnable approach that detects static urban objects from multiple views, re-identifies instances, and finally assigns a geographic position per object. |
305 | Localizing the Common Action Among a Few Videos | Pengwan Yang; Vincent Tao Hu; Pascal Mettes; Cees G. M. Snoek; | To address this task, we introduce a new 3D convolutional network architecture able to align representations from the support videos with the relevant query video segments. |
306 | TAFSSL: Task-Adaptive Feature Sub-Space Learning for few-shot classification | Moshe Lichtenstein; Prasanna Sattigeri; Rogerio Feris; Raja Giryes; Leonid Karlinsky; | In this paper we propose yet another simple technique that is important for the few shot learning performance – a search for a compact feature sub-space that is discriminative for a given few-shot test task. |
307 | Traffic Accident Benchmark for Causality Recognition | Tackgeun You; Bohyung Han; | We propose a brand new benchmark for analyzing causality in traffic accident videos by decomposing an accident into a pair of events, cause and effect. |
308 | Face Anti-Spoofing with Human Material Perception | Zitong Yu; Xiaobai Li; Xuesong Niu; Jingang Shi; Guoying Zhao; | In this paper we rephrase face anti-spoofing as a material recognition problem and combine it with classical human material perception, intending to extract discriminative and robust features for FAS. |
309 | How Can I See My Future? FvTraj: Using First-person View for Pedestrian Trajectory Prediction | Huikun Bi; Ruisi Zhang; Tianlu Mao; Zhigang Deng; Zhaoqi Wang; | This work presents a novel First-person View based Trajectory predicting model (FvTraj) to estimate the future trajectories of pedestrians in a scene given their observed trajectories and the corresponding first-person view images. |
310 | Multiple Expert Brainstorming for Domain Adaptive Person Re-identification | Yunpeng Zhai; Qixiang Ye; Shijian Lu; Mengxi Jia; Rongrong Ji; Yonghong Tian; | In this paper, we propose a multiple expert brainstorming network (MEB-Net) for domain adaptive person re-ID, opening up a promising direction about model ensemble problem under unsupervised conditions. |
311 | NASA Neural Articulated Shape Approximation | Boyang Deng; JP Lewis; Timothy Jeruzalski; Gerard Pons-Moll; Geoffrey Hinton; Mohammad Norouzi; Andrea Tagliasacchi; | This paper introduces neural articulated shape approximation (NASA), an alternative framework that enables efficient representation of articulated deformable objects using neural indicator functions that are conditioned on pose. |
312 | Towards Unique and Informative Captioning of Images | Zeyu Wang; Berthy Feng; Karthik Narasimhan; Olga Russakovsky; | We find that modern captioning systems return higher likelihoods for incorrect distractor sentences compared to ground truth captions, and that evaluation metrics like SPICE can be ‘topped’ using simple captioning systems relying on object detectors. |
313 | When Does Self-supervision Improve Few-shot Learning? | Jong-Chyi Su; Subhransu Maji; Bharath Hariharan; | Based on this analysis we present a technique that automatically selects images for SSL from a large, generic pool of unlabeled images for a given dataset that provides further improvements. |
314 | Two-branch Recurrent Network for Isolating Deepfakes in Videos | Iacopo Masi; Aditya Killekar; Royston Marian Mascarenhas; Shenoy Pratik Gurudatt; Wael AbdAlmageed; | We present a method for deepfake detection based on a two-branch network structure that isolates digitally manipulated faces by learning to amplify artifacts while suppressing the high-level face content. |
315 | Incremental Few-Shot Meta-Learning via Indirect Discriminant Alignment | Qing Liu; Orchid Majumder; Alessandro Achille; Avinash Ravichandran; Rahul Bhotika; Stefano Soatto; | We propose a method to train a model so it can learn new classification tasks while improving with each task solved. |
316 | BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models | Jiahui Yu; Pengchong Jin; Hanxiao Liu; Gabriel Bender; Pieter-Jan Kindermans; Mingxing Tan; Thomas Huang; Xiaodan Song; Ruoming Pang; Quoc Le; | In this work, we propose BigNAS, an approach that challenges the conventional wisdom that post-processing of the weights is necessary to get good prediction accuracies. |
317 | Differentiable Hierarchical Graph Grouping for Multi-Person Pose Estimation | Sheng Jin; Wentao Liu; Enze Xie; Wenhai Wang; Chen Qian; Wanli Ouyang; Ping Luo; | In this paper, we investigate a new perspective of human part grouping and reformulate it as a graph clustering task. |
318 | Global Distance-distributions Separation for Unsupervised Person Re-identification | Xin Jin; Cuiling Lan; Wenjun Zeng; Zhibo Chen; | To address this problem, we introduce a global distance-distributions separation (GDS) constraint over the two distributions to encourage the clear separation of positive and negative samples from a global view. |
319 | I2L-MeshNet: Image-to-Lixel Prediction Network for Accurate 3D Human Pose and Mesh Estimation from a Single RGB Image | Gyeongsik Moon; Kyoung Mu Lee; | To resolve the above issues, we propose I2L-MeshNet, an image-to-lixel(line+pixel) prediction network. |
320 | Pose2Mesh: Graph Convolutional Network for 3D Human Pose and Mesh Recovery from a 2D Human Pose | Hongsuk Choi; Gyeongsik Moon; Kyoung Mu Lee; | To overcome the above weaknesses, we propose Pose2Mesh, a novel graph convolutional neural network (GraphCNN)-based system that estimates the 3D coordinates of human {m mesh vertices} directly from the {m 2D human pose}. |
321 | ALRe: Outlier Detection for Guided Refinement | Mingzhu Zhu; Zhang Gao; Junzhi Yu; Bingwei He; Jiantao Liu; | In this paper, we propose a general outlier detection method for guided refinement. |
322 | Weakly-Supervised Crowd Counting Learns from Sorting rather than Locations | Yifan Yang; Guorong Li; Zhe Wu; Li Su; Qingming Huang; Nicu Sebe; | In this paper, we propose a weakly-supervised counting network, which directly regresses the crowd numbers without the location supervision. |
323 | Unsupervised Domain Attention Adaptation Network for Caricature Attribute Recognition | Wen Ji; Kelei He; Jing Huo; Zheng Gu; Yang Gao; | To facility the research in attribute learning of caricatures, we propose a caricature attribute dataset, namely WebCariA. |
324 | Many-shot from Low-shot: Learning to Annotate using Mixed Supervision for Object Detection | Carlo Biffi; Steven McDonagh; Philip Torr; Aleš Leonardis; Sarah Parisot; | Towards solving this problem we introduce, for the first time, an online annotation module (OAM) that learns to generate a many-shot set of mph{reliable} annotations from a larger volume of weakly labelled images. |
325 | Curriculum DeepSDF | Yueqi Duan; Haidong Zhu; He Wang; Li Yi Ram Nevatia; Leonidas J. Guibas; | In this paper, we design a “""shape curriculum” for learning continuous Signed Distance Function (SDF) on shapes, namely Curriculum DeepSDF. |
326 | Meshing Point Clouds with Predicted Intrinsic-Extrinsic Ratio Guidance | Minghua Liu; Xiaoshuai Zhang; Hao Su; | Instead, we propose to leverage the input point cloud as much as possible, by only adding connectivity information to existing points. |
327 | Improved Adversarial Training via Learned Optimizer | Yuanhao Xiong; Cho-Jui Hsieh; | In this paper, we empirically demonstrate that the commonly used PGD attack may not be optimal for inner maximization, and improved inner optimizer can lead to a more robust model. |
328 | Component Divide-and-Conquer for Real-World Image Super-Resolution | Pengxu Wei; Ziwei Xie; Hannan Lu; Zongyuan Zhan; Qixiang Ye; Wangmeng Zuo; Liang Lin; | In this paper, we present a large-scale Diverse Real-world image Super-Resolution dataset, i.e., DRealSR, as well as a divide-and-conquer Super-Resolution (SR) network, exploring the utility of guiding SR model with low-level image components. |
329 | Enabling Deep Residual Networks for Weakly Supervised Object Detection | Yunhang Shen; Rongrong Ji; Yan Wang; Zhiwei Chen; Feng Zheng; Feiyue Huang; Yunsheng Wu; | In this paper, we discover the intrinsic root with sophisticated analysis and propose a sequence of design principles to take full advantages of deep residual learning for WSOD from the perspectives of adding redundancy, improving robustness and aligning features. |
330 | Deep near-light photometric stereo for spatially varying reflectances | Hiroaki Santo; Michael Waechter; Yasuyuki Matsushita; | This paper presents a near-light photometric stereo method for spatially varying reflectances. |
331 | Learning Visual Representations with Caption Annotations | Mert Bulent Sariyildiz; Julien Perez; Diane Larlus; | To tackle this task, we propose hybrid models, with dedicated visual and textual encoders, and we show that the visual representations learned as a by-product of solving this task transfer well to a variety of target tasks. |
332 | Solving Long-tailed Recognition with Deep Realistic Taxonomic Classifier | Tz-Ying Wu; Pedro Morgado; Pei Wang; Chih-Hui Ho; Nuno Vasconcelos; | Motivated by this, a deep realistic taxonomic classifier (Deep-RTC) is proposed as a new solution to the long-tail problem, combining realism with hierarchical predictions. |
333 | Regression of Instance Boundary by Aggregated CNN and GCN | Yanda Meng; Wei Meng; Dongxu Gao; Yitian Zhao; Xiaoyun Yang; Xiaowei Huang; Yalin Zheng; | This paper proposes a straightforward, intuitive deep learning approach for (biomedical) image segmentation tasks. |
334 | Social Adaptive Module for Weakly-supervised Group Activity Recognition | Rui Yan; Lingxi Xie; Jinhui Tang; Xiangbo Shu; Qi Tian; | This paper presents a new task named weakly-supervised group activity recognition (GAR) which differs from conventional GAR tasks in that only video-level labels are available, yet the important persons within each frame are not provided even in the training data. |
335 | RGB-D Salient Object Detection with Cross-Modality Modulation and Selection | Chongyi Li; Runmin Cong; Yongri Piao; Qianqian Xu; Chen Change Loy; | We present an effective method to progressively integrate and refine the cross-modality complementarities for RGB-D salient object detection (SOD). |
336 | RetrieveGAN: Image Synthesis via Differentiable Patch Retrieval | Hung-Yu Tseng; Hsin-Ying Lee; Lu Jiang; Ming-Hsuan Yang; Weilong Yang; | In this work, we aim to synthesize images from scene description with retrieved patches as reference. |
337 | Cheaper Pre-training Lunch: An Efficient Paradigm for Object Detection | Dongzhan Zhou; Xinchi Zhou; Hongwen Zhang; Shuai Yi; Wanli Ouyang; | In this paper, we propose a general and efficient pre-training paradigm, Montage pre-training, for object detection. |
338 | Faster Person Re-Identification | Guan’an Wang; Shaogang Gong; Jian Cheng; Zengguang Hou; | In this work, we introduce a new solution for fast ReID by formulating a novel Coarse-to-Fine (CtF) hashing code search strategy, which complementarily uses short and long codes, achieving both faster speed and better accuracy. |
339 | Quantization Guided JPEG Artifact Correction | Max Ehrlich; Ser-Nam Lim; Larry Davis; Abhinav Shrivastava; | We solve this problem by creating a novel architecture which is parameterized by the JPEG file’s quantization matrix. |
340 | 3PointTM: Faster Measurement of High-Dimensional Transmission Matrices | Yujun Chen; Manoj Kumar Sharma; Ashutosh Sabharwal; Ashok Veeraraghavan; Aswin C. Sankaranarayanan; | In this paper, we propose 3PointTM, an approach for sensing TMs that uses a minimal number of measurements per pixel – reducing the measurement budget by a factor of two as compared to state of the art in phase-shifting holography for measuring TMs – and has a low computational complexity as compared to phase retrieval. |
341 | Joint Bilateral Learning for Real-time Universal Photorealistic Style Transfer | Xide Xia; Meng Zhang; Tianfan Xue; Zheng Sun; Hui Fang; Brian Kulis ; Jiawen Chen; | We propose a new end-to-end model for photorealistic style transfer that is both fast and inherently generates photorealistic results. |
342 | Beyond 3DMM Space: Towards Fine-grained 3D Face Reconstruction | Xiangyu Zhu; Fan Yang; Di Huang; Chang Yu; Hao Wang; Jianzhu Guo; Zhen Lei; Stan Z. Li; | Secondly, we propose a Fine-Grained reconstruction Network (FGNet) that can concentrate on shape modification by warping the network input and output to the UV space. |
343 | World-Consistent Video-to-Video Synthesis | Arun Mallya; Ting-Chun Wang; Karan Sapra; Ming-Yu Liu; | In this work, we propose a framework for utilizing all past generated frames when synthesizing each frame. |
344 | Commonality-Parsing Network across Shape and Appearance for Partially Supervised Instance Segmentation | Qi Fan; Lei Ke; Wenjie Pei; Chi-Keung Tang; Yu-Wing Tai; | We propose to learn the underlying class-agnostic commonalities that can be generalized from mask-annotated categories to novel categories. |
345 | GMNet: Graph Matching Network for Large Scale Part Semantic Segmentation in the Wild | Umberto Michieli; Edoardo Borsato; Luca Rossi; Pietro Zanuttigh; | In this work, we propose a novel framework combining higher object-level context conditioning and part-level spatial relationships to address the task. |
346 | Event-based Asynchronous Sparse Convolutional Networks | Nico Messikommer; Daniel Gehrig; Antonio Loquercio; Davide Scaramuzza; | In this work, we present a general framework for converting models trained on synchronous image-like event representations into asynchronous models with identical output, thus directly leveraging the intrinsic asynchronous and sparse nature of the event data. |
347 | AtlantaNet: Inferring the 3D Indoor Layout from a Single 360(?) Image beyond the Manhattan World Assumption | Giovanni Pintore; Marco Agus; Enrico Gobbetti; | We introduce a novel end-to-end approach to predict a 3D room layout from a single panoramic image. |
348 | AttentionNAS: Spatiotemporal Attention Cell Search for Video Classification | Xiaofang Wang; Xuehan Xiong; Maxim Neumann; AJ Piergiovanni; Michael S. Ryoo; Anelia Angelova; Kris M. Kitani; Wei Hua; | We propose a novel search space for spatiotemporal attention cells, which allows the search algorithm to flexibly explore various design choices in the cell. |
349 | REMIND Your Neural Network to Prevent Catastrophic Forgetting | Tyler L. Hayes; Kushal Kafle; Robik Shrestha; Manoj Acharya; Christopher Kanan; | Here, we propose REMIND, a brain-inspired approach that enables efficient replay with compressed representations. |
350 | Image Classification in the Dark using Quanta Image Sensors | Abhiram Gnanasambandam; Stanley H. Chan; | In this paper, we present a new low-light image classification solution using Quanta Image Sensors (QIS). |
351 | n-Reference Transfer Learning for Saliency Prediction | Yan Luo; Yongkang Wong; Mohan S. Kankanhalli; Qi Zhao; | To solve this problem, we propose a few-shot transfer learning paradigm for saliency prediction, which enables efficient transfer of knowledge learned from the existing large-scale saliency datasets to a target domain with limited labeled examples. |
352 | Progressively Guided Alternate Refinement Network for RGB-D Salient Object Detection | Shuhan Chen; Yun Fu; | In this paper, we aim to develop an efficient and compact deep network for RGB-D salient object detection, where the depth image provides complementary information to boost performance in complex scenarios. |
353 | Bottom-Up Temporal Action Localization with Mutual Regularization | Peisen Zhao; Lingxi Xie; Chen Ju; Ya Zhang; Yanfeng Wang; Qi Tian; | To alleviate this problem, we introduce two regularization terms to mutually regularize the learning procedure: the Intra-phase Consistency (IntraC) regularization is proposed to make the predictions verified inside each phase and the Inter-phase Consistency (InterC) regularization is proposed to keep consistency between these phases. |
354 | On Modulating the Gradient for Meta-Learning | Christian Simon; Piotr Koniusz; Richard Nock; Mehrtash Harandi; | Inspired by optimization techniques, we propose a novel meta-learning algorithm with gradient modulation to encourage fast-adaptation of neural networks in the absence of abundant data. |
355 | Domain-Specific Mappings for Generative Adversarial Style Transfer | Hsin-Yu Chang; Zhixiang Wang; Yung-Yu Chuang; | For addressing this issue, this paper leverages domain-specific mappings for remapping latent features in the shared content space to domain-specific content spaces. |
356 | DiVA: Diverse Visual Feature Aggregation for Deep Metric Learning | Timo Milbich; Karsten Roth; Homanga Bharadhwaj; Samarth Sinha; Yoshua Bengio; Björn Ommer; Joseph Paul Cohen; | To this end, we propose and study multiple complementary learning tasks, targeting conceptually different data relationships by only resorting to the available training samples and labels of a standard DML setting. |
357 | DHP: Differentiable Meta Pruning via HyperNetworks | Yawei Li; Shuhang Gu; Kai Zhang; Luc Van Gool; Radu Timofte; | To circumvent this problem, this paper introduces a differentiable prun-ing method via hypernetworks for automatic network pruning |
358 | Deep Transferring Quantization | Zheng Xie; Zhiquan Wen; Jing Liu; Zhiqiang Liu; Xixian Wu; Mingkui Tan; | Specifically, we propose a method named deep transferring quantization (DTQ) to effectively exploit the knowledge in a pre-trained full-precision model. |
359 | Deep Credible Metric Learning for Unsupervised Domain Adaptation Person Re-identification | Guangyi Chen; Yuhao Lu; Jiwen Lu; Jie Zhou; | In this paper, we propose a deep credible metric learning (DCML) method for unsupervised domain adaptation person re-identification. |
360 | Temporal Coherence or Temporal Motion: Which is More Critical for Video-based Person Re-identification? | Guangyi Chen; Yongming Rao; Jiwen Lu; Jie Zhou; | To distill the temporal coherence part of video representationfrom frame representations, we propose a simple yet effective Adversarial Feature Augmentation (AFA) method, which highlights the temporal coherence features by introducing adversarial augmented temporal motionnoise. |
361 | Arbitrary-Oriented Object Detection with Circular Smooth Label | Xue Yang; Junchi Yan; | We design a new rotation detection baseline, to address the boundary problem by transforming angular prediction from a regression problem to a classification task with little accuracy loss, whereby high-precision angle classification is devised in contrast to previous works using coarse-granularity in rotation detection. |
362 | Learning Event-Driven Video Deblurring and Interpolation | Songnan Lin; Jiawei Zhang; Jinshan Pan; Zhe Jiang; Dongqing Zou; Yongtian Wang; Jing Chen; Jimmy Ren; | In this paper, we propose an effective event-driven video deblurring and interpolation algorithm based on deep convolutional neural networks (CNNs). |
363 | Vectorizing World Buildings: Planar Graph Reconstruction by Primitive Detection and Relationship Inference | Nelson Nauata; Yasutaka Furukawa; | This paper tackles a 2D architecture vectorization problem, whose task is to infer an outdoor building architecture as a 2D planar graph from a single RGB image. |
364 | Learning to Combine: Knowledge Aggregation for Multi-Source Domain Adaptation | Hang Wang; Minghao Xu; Bingbing Ni; Wenjun Zhang; | To mitigate these problems, we propose a Learning to Combine for Multi-Source Domain Adaptation (LtC-MSDA) framework via exploring interactions among domains. |
365 | CSCL: Critical Semantic-Consistent Learning for Unsupervised Domain Adaptation | Jiahua Dong; Yang Cong; Gan Sun; Yuyang Liu; Xiaowei Xu; | To address above challenges, we develop a new Critical Semantic-Consistent Learning (CSCL) model, which mitigates the discrepancy of both domain-wise and category-wise distributions. |
366 | Prototype Mixture Models for Few-shot Semantic Segmentation | Boyu Yang; Chang Liu; Bohao Li; Jianbin Jiao; Qixiang Ye; | In this paper, we propose prototype mixture models (PMMs), which correlate diverse image regions with multiple prototypes to enforce the prototype-based semantic representation. |
367 | Webly Supervised Image Classification with Self-Contained Confidence | Jingkang Yang; Litong Feng; Weirong Chen; Xiaopeng Yan; Huabin Zheng ; Ping Luo; Wayne Zhang; | Inspired by DNNs’ ability on confidence prediction, we introduce self-contained confidence (SCC) by adapting model uncertainty for WSL setting and use it to sample-wisely balance $\mathcal{L}_s$ and $\mathcal{L}_w$. |
368 | Search What You Want: Barrier Panelty NAS for Mixed Precision Quantization | Haibao Yu; Qi Han; Jianbo Li; Jianping Shi; Guangliang Cheng; Bin Fan; | In this paper, we propose a novel soft Barrier Penalty based NAS (BP-NAS) for mixed precision quantization, which ensures all the searched models are inside the valid domain defined by the complexity constraint, thus could return an optimal model under the given constraint by conducting search only one time. |
369 | Monocular 3D Object Detection via Feature Domain Adaptation | Lele Chen; Guofeng Cui; Celong Liu; Zhong Li; Ziyi Kou; Yi Xu; Chenliang Xu; | In this paper, we propose a novel domain adaptation based monocular 3D object detection framework named DA-3Ddet, which adapts the feature from unsound image-based pseudo-LiDAR domain to the accurate real LiDAR domain for performance boosting. |
370 | AUTO3D: Novel view synthesis through unsupervisely learned variational viewpoint and global 3D representation | Xiaofeng Liu; Tong Che; Yiqun Lu; Chao Yang; Site Li; Jane You; | In the viewer-centered coordinates, we construct an end-to-end trainable conditional variational framework to disentangle the unsupervisely learned relative-pose/rotation and implicit global 3D representation (shape, texture and the origin of viewer-centered coordinates, etc.). |
371 | VPN: Learning Video-Pose Embedding for Activities of Daily Living | Srijan Das; Saurav Sharma; Rui Dai; François Brémond; Monique Thonnat; | In this paper, we focus on the spatio-temporal aspect of recognizing Activities of Daily Living (ADL). |
372 | Soft Anchor-Point Object Detection | Chenchen Zhu; Fangyi Chen; Zhiqiang Shen; Marios Savvides; | In this work, we boost the performance of the anchor-point detector over the key-point counterparts while maintaining the speed advantage. |
373 | Beyond Fixed Grid: Learning Geometric Image Representation with a Deformable Grid | Jun Gao; Zian Wang; Jinchen Xuan; Sanja Fidler; | We introduce mph{Deformable Grid} (Defgrid), a learnable neural network module that predicts location offsets of vertices of a 2-dimensional triangular grid such that the edges of the deformed grid align with image boundaries. |
374 | Soft Expert Reward Learning for Vision-and-Language Navigation | Hu Wang; Qi Wu; Chunhua Shen; | In this paper, we introduce a Soft Expert Reward Learning (SERL) model to overcome the reward engineering designing and generalisation problems of the VLN task. |
375 | Part-aware Prototype Network for Few-shot Semantic Segmentation | Yongfei Liu; Xiangyi Zhang; Songyang Zhang; Xuming He; | In this paper, we propose a novel few-shot semantic segmentation framework based on the prototype representation. |
376 | Learning from Extrinsic and Intrinsic Supervisions for Domain Generalization | Shujun Wang; Lequan Yu; Caizi Li; Chi-Wing Fu; Pheng-Ann Heng; | To this end, we present a new domain generalization framework that learns how to generalize across domains simultaneously from extit{extrinsic} relationship supervision and extit{intrinsic} self-supervision for images from multi-source domains. |
377 | Joint Learning of Social Groups, Individuals Action and Sub-group Activities in Videos | Mahsa Ehsanpour; Alireza Abedin; Fatemeh Saleh; Javen Shi; Ian Reid ; Hamid Rezatofighi; | In this paper, we solve the problem of simultaneously grouping people by their social interactions, predicting their individual actions and the social activity of each social group, which we call the social task. |
378 | Whole-Body Human Pose Estimation in the Wild | Sheng Jin; Lumin Xu; Jin Xu; Can Wang; Wentao Liu; Chen Qian; Wanli Ouyang; Ping Luo; | To fill in this blank, we introduce COCO-WholeBody which extends COCO dataset with whole-body annotations. |
379 | Relative Pose Estimation of Calibrated Cameras with Known SE(3) Invariants | Bo Li; Evgeniy Martyushev; Gim Hee Lee; | In this paper, we present a complete comprehensive study of the relative pose estimation problem for a calibrated camera constrained by known $\mathrm{SE}(3)$ invariant, which involves 5 minimal problems in total. |
380 | Sequential Convolution and Runge-Kutta Residual Architecture for Image Compressed Sensing | Runkai Zheng; Yinqi Zhang; Daolang Huang; Qingliang Chen; | To address the two challenges, this paper proposes a novel Runge-Kutta Convolutional Compressed Sensing Network (RK-CCSNet). |
381 | Deep Hough Transform for Semantic Line Detection | Qi Han; Kai Zhao; Jun Xu; Ming-Ming Cheng; | In this paper, we put forward a simple yet effective method to detect meaningful straight lines, a.k.a. semantic lines, in given scenes. |
382 | Structured Landmark Detection via Topology-Adapting Deep Graph Learning | Weijian Li; Yuhang Lu; Kang Zheng; Haofu Liao; Chihung Lin; Jiebo Luo; Chi-Tung Cheng; Jing Xiao; Le Lu; Chang-Fu Kuo; Shun Miao; | In this work, we present a new topology-adapting deep graph learning approach for accurate anatomical facial and medical (e.g., hand, pelvis) landmark detection. |
383 | 3D Human Shape and Pose from a Single Low-Resolution Image with Self-Supervised Learning | Xiangyu Xu; Hao Chen; Francesc Moreno-Noguer; László A. Jeni; Fernando De la Torre; | To address the above issues, this paper proposes a novel algorithm called RSC-Net, which consists of a Resolution-aware network, a Self-supervision loss, and a Contrastive learning scheme. |
384 | Learning to Balance Specificity and Invariance for In and Out of Domain Generalization | Prithvijit Chattopadhyay; Yogesh Balaji; Judy Hoffman; | We introduce Domain-specific Masks for Generalization, a model for improving both in-domain and out-of-domain generalization performance. |
385 | Contrastive Learning for Unpaired Image-to-Image Translation | Taesung Park Alexei A. Efros Richard Zhang Jun-Yan Zhu; | We propose a straightforward method for doing so — maximizing mutual information between the two, using a framework based on contrastive learning. |
386 | DLow: Diversifying Latent Flows for Diverse Human Motion Prediction | Ye Yuan; Kris Kitani; | To address these problems, we propose a novel sampling method, Diversifying Latent Flows (DLow), to produce a diverse set of samples from a pretrained deep generative model. |
387 | GRNet: Gridding Residual Network for Dense Point Cloud Completion | Haozhe Xie; Hongxun Yao; Shangchen Zhou; Jiageng Mao; Shengping Zhang; Wenxiu Sun; | To solve this problem, we introduce 3D grids as intermediate representations to regularize unordered point clouds. |
388 | Gait Lateral Network: Learning Discriminative and Compact Representations for Gait Recognition | Saihui Hou; Chunshui Cao; Xu Liu; Yongzhen Huang; | In this work, we propose a novel network named Gait Lateral Network (GLN) which can learn both discriminative and compact representations from the silhouettes for gait recognition. |
389 | Blind Face Restoration via Deep Multi-scale Component Dictionaries | Xiaoming Li; Chaofeng Chen; Shangchen Zhou; Xianhui Lin; Wangmeng Zuo; Lei Zhang; | To address this issue, this paper suggests a deep face dictionary network (termed as DFDNet) to guide the restoration process of degraded observations. |
390 | Robust Neural Networks inspired by Strong Stability Preserving Runge-Kutta methods | Byungjoo Kim; Bryce Chudomelka; Jinyoung Park; Jaewoo Kang; Youngjoon Hong; Hyunwoo J. Kim; | Motivated by the SSP property and a generalized Runge-Kutta method, we proposed Strong Stability Preserving networks (SSP networks) which improve robustness against adversarial attacks. |
391 | Inequality-Constrained and Robust 3D Face Model Fitting | Evangelos Sariyanidi; Casey J. Zampella; Robert T. Schultz; Birkan Tunc; | We propose a new formulation that does not require the tuning of any weight parameter. |
392 | Gabor Layers Enhance Network Robustness | Juan C. Pérez; Motasem Alfarra; Guillaume Jeanneret; Adel Bibi; Ali Thabet; Bernard Ghanem; Pablo Arbeláez; | In particular, we explore the effect of replacing the first layers of various deep architectures with Gabor layers (i.e. convolutional layers with filters that are based on learnable Gabor parameters) on robustness against adversarial attacks. |
393 | Conditional Image Repainting via Semantic Bridge and Piecewise Value Function | Shuchen Weng; Wenbo Li; Dawei Li; Hongxia Jin; Boxin Shi; | In this work, we improve the compositing by breaking through the latent ceiling using a novel piecewise value function. |
394 | Learnable Cost Volume Using the Cayley Representation | Taihong Xiao; Jinwei Yuan; Deqing Sun; Qifei Wang Xin-Yu Zhang; Kehan Xu; Ming-Hsuan Yang; | To address this issue, we propose a learnable cost volume (LCV) using an elliptical inner product, which generalizes the standard inner product by a positive definite kernel matrix. |
395 | HALO: Hardware-Aware Learning to Optimize | Chaojian Li; Tianlong Chen; Haoran You; Zhangyang Wang; Yingyan Lin; | To this end, we propose hardware-aware learning to optimize (HALO), a practical meta optimizer dedicated to resource-efficient on-device adaptation. |
396 | Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling | Jia Zheng; Junfei Zhang; Jing Li; Rui Tang; Shenghua Gao; Zihan Zhou; | In this paper, we present a new synthetic dataset, Structured3D, with the aim of providing large-scale photo-realistic images with rich 3D structure annotations for a wide spectrum of structured 3D modeling tasks. |
397 | BroadFace: Looking at Tens of Thousands of People at Once for Face Recognition | Yonghyun Kim; Wonpyo Park; Jongju Shin; | To overcome this difficulty, we propose a novel method called BroadFace, which is a learning process to consider a massive set of identities, comprehensively. |
398 | Interpretable Visual Reasoning via Probabilistic Formulation under Natural Supervision | Xinzhe Han; Shuhui Wang; Chi Su; Weigang Zhang; Qingming Huang; Qi Tian; | In this paper, we rethink implicit reasoning process in VQA, and propose a new formulation which maximizes the log-likelihood of joint distribution for the observed question and predicted answer. |
399 | Domain Adaptive Semantic Segmentation Using Weak Labels | Sujoy Paul; Yi-Hsuan Tsai; Samuel Schulter; Amit K. Roy-Chowdhury; Manmohan Chandraker; | We propose a novel framework for domain adaptation in semantic segmentation with image-level weak labels in the target domain. In experiments, we show considerable improvements with respect to the existing state-of-the-arts in UDA and present a new benchmark in the WDA setting. |
400 | Knowledge Distillation Meets Self-Supervision | Guodong Xu; Ziwei Liu; Xiaoxiao Li; Chen Change Loy; | In this paper, we discuss practical ways to exploit those noisy self-supervision signals with selective transfer for distillation. |
401 | Efficient Neighbourhood Consensus Networks via Submanifold Sparse Convolutions | Ignacio Rocco; Relja Arandjelovi?; Josef Sivic; | In this work we target the problem of estimating accurately localised correspondences between a pair of images. |
402 | Reconstructing the Noise Variance Manifold for Image Denoising | Ioannis Marras; Grigorios G. Chrysos; Ioannis Alexiou; Gregory Slabaugh; Stefanos Zafeiriou; | To fill the gap, in this work we introduce the idea of a cGAN which explicitly leverages structure in the image noise variance space. |
403 | Occlusion-Aware Depth Estimation with Adaptive Normal Constraints | Xiaoxiao Long; Lingjie Liu; Christian Theobalt; Wenping Wang; | We present a new learning-based method for multi-frame depth estimation from a color video, which is a fundamental problem in scene understanding, robot navigation or handheld 3D reconstruction. |
404 | VisualEchoes: Spatial Image Representation Learning through Echolocation | Ruohan Gao; Changan Chen; Ziad Al-Halah; Carl Schissler; Kristen Grauman; | We explore the spatial cues contained in echoes and how they can benefit vision tasks that require spatial reasoning. |
405 | Smooth-AP: Smoothing the Path Towards Large-Scale Image Retrieval | Andrew Brown; Weidi Xie; Vicky Kalogeiton; Andrew Zisserman; | To this end, we introduce an objective that optimises instead a smoothed approximation of AP, coined Smooth-AP. |
406 | Naive-Student: Leveraging Semi-Supervised Learning in Video Sequences for Urban Scene Segmentation | Liang-Chieh Chen; Raphael Gontijo Lopes; Bowen Cheng; Maxwell D. Collins; Ekin D. Cubuk; Barret Zoph; Hartwig Adam; Jonathon Shlens; | In this work, we ask if we may leverage semi-supervised learning in unlabeled video sequences and extra images to improve the performance on urban scene segmentation, simultaneously tackling semantic, instance, and panoptic segmentation. |
407 | Spatially Aware Multimodal Transformers for TextVQA | Yash Kant; Dhruv Batra; Peter Anderson; Alexander Schwing; Devi Parikh; Jiasen Lu; Harsh Agrawal; | In contrast, we propose a novel spatially aware self-attention layer such that each visual entity only looks at neighboring entities defined by a spatial graph. |
408 | Every Pixel Matters: Center-aware Feature Alignment for Domain Adaptive Object Detector | Cheng-Chun Hsu; Yi-Hsuan Tsai; Yen-Yu Lin; Ming-Hsuan Yang; | Different from existing solutions, we propose a domain adaptation framework that accounts for each pixel, especially via predicting pixel-wise objectness and centerness. |
409 | URIE: Universal Image Enhancement for Visual Recognition in the Wild | Taeyoung Son Juwon Kang Namyup Kim Sunghyun Cho Suha Kwak; | To tackle this issue, we present a Universal and Recognition-friendly Image Enhancement network, dubbed URIE, which is attached in front of existing recognition models and enhances distorted input to improve their performance without retraining them. |
410 | Pyramid Multi-view Stereo Net with Self-adaptive View Aggregation | Hongwei Yi; Zizhuang Wei; Mingyu Ding; Runze Zhang; Yisong Chen; Guoping Wang; Yu-Wing Tai; | In this paper, we propose an effective and efficient pyramid multi-view stereo (MVS) net with self-adaptive view aggregation for accurate and complete dense point cloud reconstruction. |
411 | SPL-MLL: Selecting Predictable Landmarks for Multi-Label Learning | Junbing Li; Changqing Zhang; Pengfei Zhu; Baoyuan Wu; Lei Chen; Qinghua Hu; | In this work, we propose to select a small subset of labels as landmarks which are easy to predict according to input (predictable) and can well recover the other possible labels (representative). |
412 | Unpaired Image-to-Image Translation using Adversarial Consistency Loss | Yihao Zhao; Ruihai Wu; Hao Dong; | In this paper, we propose a novel adversarial-consistency loss for image-to-image translation. |
413 | Discriminability Distillation in Group Representation Learning | Manyuan Zhang; Guanglu Song; Hang Zhou; Yu Liu; | We claim the most significant indicator to show whether the group representation can be benefited from one of its element is not the quality or an inexplicable score, but the discriminability w.r.t.the model. |
414 | Monocular Expressive Body Regression through Body-Driven Attention | Vasileios Choutas; Georgios Pavlakos; Timo Bolkart; Dimitrios Tzionas ; Michael J. Black; | We address these limitations by introducing ExPose(EXpressive POse and Shape rEgression), which directly regresses the body, face, and hands, in SMPL-X format, from an RGB image. |
415 | Dual Adversarial Network: Toward Real-world Noise Removal and Noise Generation | Zongsheng Yue; Qian Zhao; Lei Zhang; Deyu Meng; | In this work, we propose a novel unified framework to simultaneously deal with the noise removal and noise generation tasks. |
416 | Linguistic Structure Guided Context Modeling for Referring Image Segmentation | Tianrui Hui; Si Liu; Shaofei Huang; Guanbin Li; Sansi Yu; Faxi Zhang; Jizhong Han; | To tackle this problem, we propose a “gather-propagate-distribute” scheme to model multimodal context by crossmodal interaction and implement this scheme as a novel Linguistic Structure guided Context Modeling (LSCM) module. |
417 | Federated Visual Classification with Real-World Data Distribution | Tzu-Ming Harry Hsu; Hang Qi; Matthew Brown; | In this work, we characterize the effect these real-world data distributions have on distributed learning, using as a benchmark the standard Federated Averaging (FedAvg) algorithm. |
418 | Robust Re-Identification by Multiple Views Knowledge Distillation | Angelo Porrello; Luca Bergamini; Simone Calderara; | In this work, we devise a training strategy that allows the transfer of a superior knowledge, arising from a set of views depicting the target object. |
419 | Defocus Deblurring Using Dual-Pixel Data | Abdullah Abuolaim; Michael S. Brown; | We propose an effective defocus deblurring method that exploits data available on dual-pixel (DP) sensors found on most modern cameras. |
420 | RhyRNN: Rhythmic RNN for Recognizing Events in Long and Complex Videos | Tianshu Yu; Yikang Li; Baoxin Li; | To address this, we propose Rhythmic RNN (RhyRNN) which is capable of handling long video sequences (up to 3,000 frames) as well as capturing rhythms at different scales. |
421 | Take an Emotion Walk: Perceiving Emotions from Gaits Using Hierarchical Attention Pooling and Affective Mapping | Uttaran Bhattacharya; Christian Roncal; Trisha Mittal; Rohan Chandra ; Kyra Kapsaskis; Kurt Gray; Aniket Bera; Dinesh Manocha; | We present an autoencoder-based semi-supervised approach to classify perceived human emotions from walking styles obtained from videos or motion-captured data and represented as sequences of 3D poses. |
422 | Weighing Counts: Sequential Crowd Counting by Reinforcement Learning | Liang Liu; Hao Lu; Hongwei Zou; Haipeng Xiong; Zhiguo Cao; Chunhua Shen; | Inspired by scale weighing, we propose a novel ‘counting scale’ termed LibraNet where the count value is analogized by weight. |
423 | Reflection Backdoor: A Natural Backdoor Attack on Deep Neural Networks | Yunfei Liu; Xingjun Ma; James Bailey; Feng Lu; | In this paper, we present a new type of backdoor attack inspired by an important natural phenomenon: reflection. |
424 | Learning to Learn with Variational Information Bottleneck for Domain Generalization | Yingjun Du; Jun Xu; Huan Xiong; Qiang Qiu; Xiantong Zhen; Cees G. M. Snoek; Ling Shao; | Domain generalization models learn to generalize to previously unseen domains, but suffer from prediction uncertainty and domain shift. In this paper, we address both problems. |
425 | Deep Positional and Relational Feature Learning for Rotation-Invariant Point Cloud Analysis | Ruixuan Yu; Xin Wei; Federico Tombari; Jian Sun; | In this paper we propose a rotation-invariant deep network for point clouds analysis. |
426 | Thanks for Nothing: Predicting Zero-Valued Activations with Lightweight Convolutional Neural Networks | Gil Shomron; Ron Banner; Moran Shkolnik; Uri Weiser; | Inspired by the observation that spatial correlation exists in CNN output feature maps (ofms), we propose a method to dynamically predict whether ofm activations are zero-valued or not according to their neighboring activation values, thereby avoiding zero-valued activations and reducing the number of convolution operations. |
427 | Layered Neighborhood Expansion for Incremental Multiple Graph Matching | Zixuan Chen; Zhihui Xie; Junchi Yan Yinqiang Zheng; Xiaokang Yang; | In this paper, we treat the graphs as graphs on a super-graph, and propose a novel breadth first search based method for expanding the neighborhood on the super-graph for a new coming graph, such that the matching with the new graph can be efficiently performed within the constructed neighborhood. |
428 | SCAN: Learning to Classify Images without Labels | Wouter Van Gansbeke; Simon Vandenhende; Stamatios Georgoulis; Marc Proesmans; Luc Van Gool; | In this paper, we deviate from recent works, and advocate a two-step approach where feature learning and clustering are decoupled. |
429 | Graph convolutional networks for learning with few clean and many noisy labels | Ahmet Iscen; Giorgos Tolias; Yannis Avrithis; Ond?ej Chum; Cordelia Schmid; | In this work we consider the problem of learning a classifier from noisy labels when a few clean labeled examples are given. |
430 | Object-and-Action Aware Model for Visual Language Navigation | Yuankai Qi; Zizheng Pan; Shengping Zhang; Anton van den Hengel; Qi Wu; | In this paper, we propose an Object-and-Action Aware Model (OAAM) that processes these two different forms of natural language based instruction separately. |
431 | A Comprehensive Study of Weight Sharing in Graph Networks for 3D Human Pose Estimation | Kenkun Liu; Rongqi Ding; Zhiming Zou; Le Wang; Wei Tang; | The objective of this paper is to have a comprehensive and systematic study of weight sharing in GCNs for 3D HPE. |
432 | MuCAN: Multi-Correspondence Aggregation Network for Video Super-Resolution | Wenbo Li; Xin Tao; Taian Guo; Lu Qi; Jiangbo Lu; Jiaya Jia; | Motivated by these findings, we propose a temporal multi-correspondence aggregation strategy to leverage most similar patches across frames, and also a cross-scale nonlocal-correspondence aggregation scheme to explore self-similarity of images across scales. |
433 | Efficient Semantic Video Segmentation with Per-frame Inference | Yifan Liu; Chunhua Shen; Changqian Yu; Jingdong Wang; | In contrast, here we explicitly consider the temporal consistency among frames as extra constraints during training and process each frame independently in the inference phase. |
434 | Increasing the Robustness of Semantic Segmentation Models with Painting-by-Numbers | Christoph Kamann; Carsten Rother; | We present a new training schema that increases this shape bias. |
435 | Deep Spiking Neural Network: Energy Efficiency Through Time based Coding | Bing Han; Kaushik Roy; | In this work, we propose an ANN to SNN conversion methodology that uses a time-based coding scheme, named Temporal-Switch-Coding (TSC), and a corresponding TSC spiking neuron model. |
436 | InfoFocus: 3D Object Detection for Autonomous Driving with Dynamic Information Modeling | Jun Wang; Shiyi Lan; Mingfei Gao; Larry S. Davis; | To address this issue, we propose a novel 3D object detection framework with dynamic information modeling. |
437 | Utilizing Patch-level Category Activation Patterns for Multiple Class Novelty Detection | Poojan Oza; Vishal M. Patel; | In this paper, we propose a novel method that makes deep convolutional neural networks robust to novel classes. |
438 | People as Scene Probes | Yifan Wang; Brian L. Curless; Steven M. Seitz; | By analyzing the motion of people and other objects in a scene, we demonstrate how to infer depth, occlusion, lighting, and shadow information from video taken from a single camera viewpoint. This information is then used to composite new objects into the same scene with a high degree of automation and realism. |
439 | Mapping in a Cycle: Sinkhorn Regularized Unsupervised Learning for Point Cloud Shapes | Lei Yang; Wenxi Liu; Zhiming Cui; Nenglun Chen; Wenping Wang; | We propose an unsupervised learning framework with the pretext task of finding dense correspondences between point cloud shapes from the same category based on the cycle-consistency formulation. |
440 | Label-Efficient Learning on Point Clouds using Approximate Convex Decompositions | Matheus Gadelha; Aruni RoyChowdhury; Gopal Sharma; Evangelos Kalogerakis; Liangliang Cao; Erik Learned-Miller; Rui Wang; Subhransu Maji; | In this paper, we investigate the use of Approximate Convex Decompositions (ACD) as a self-supervisory signalfor label-efficient learning of point cloud representations. |
441 | TexMesh: Reconstructing Detailed Human Texture and Geometry from RGB-D Video | Tiancheng Zhi; Christoph Lassner; Tony Tung; Carsten Stoll; Srinivasa G. Narasimhan; Minh Vo; | We present TexMesh, a novel approach to reconstruct detailed human meshes with high-resolution full-body texture from RGB-D video. |
442 | Consistency-based Semi-supervised Active Learning: Towards Minimizing Labeling Cost | Mingfei Gao; Zizhao Zhang; Guo Yu; Sercan . Ar?k; Larry S. Davis; Tomas Pfister; | Here, we propose to unify unlabeled sample selection and model training towards minimizing labeling cost, and make two contributions towards that end. |
443 | Point-Set Anchors for Object Detection, Instance Segmentation and Pose Estimation | Fangyun Wei; Xiao Sun; Hongyang Li; Jingdong Wang; Stephen Lin; | While this center-point regression is simple and efficient, we argue that the image features extracted at a central point contain limited information for predicting distant keypoints or bounding box boundaries, due to object deformation and scale/orientation variation. To facilitate inference, we propose to instead perform regression from a set of points placed at more advantageous positions. |
444 | Modeling 3D Shapes by Reinforcement Learning | Cheng Lin; Tingxiang Fan; Wenping Wang; Matthias Nießner; | Inspired by such artist-based modeling, we propose a two-step neural framework based on RL to learn 3D modeling policies. |
445 | LST-Net: Learning a Convolutional Neural Network with a Learnable Sparse Transform | Lida Li; Kun Wang; Shuai Li; Xiangchu Feng; Lei Zhang; | In this paper, we propose to mitigate this issue by learning a CNN with a learnable sparse transform (LST), which converts the input features into a more compact and sparser domain so that the spatial and channel-wise redundancy can be more effectively reduced. |
446 | Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision | Damien Teney; Ehsan Abbasnedjad; Anton van den Hengel; | We propose an auxiliary training objective that improves the generalization capabilities of neural networks by leveraging an overlooked supervisory signal found in existing datasets. |
447 | CN: Channel Normalization For Point Cloud Recognition | Zetong Yang; Yanan Sun; Shu Liu; Xiaojuan Qi; Jiaya Jia; | In this paper, we deeply analyze these point recognition frameworks and present a factor, called difference ratio, to measure the influence of structure information among different levels on the final representation. |
448 | Rethinking the Defocus Blur Detection Problem and A Real-Time Deep DBD Model | Ning Zhang; Junchi Yan; | In this work, we propose novel perspectives on the DBD problem and design convenient approach to build a real-time cost-effective DBD model. |
449 | AutoMix: Mixup Networks for Sample Interpolation via Cooperative Barycenter Learning | Jianchao Zhu; Liangliang Shi; Junchi Yan; Hongyuan Zha; | This paper proposes new ways of sample mixing by thinking of the process as generation of barycenter in a metric space for data augmentation. |
450 | Scene Text Image Super-resolution in the wild | Wenjia Wang; Enze Xie; Xuebo Liu; Wenhai Wang; Ding Liang; Chunhua Shen; Xiang Bai; | In this purpose, a new Text Super-Resolution Network, termed TSRN, with three novel modules is developed. |
451 | Coupling Explicit and Implicit Surface Representations for Generative 3D Modeling | Omid Poursaeed; Matthew Fisher; Noam Aigerman; Vladimir G. Kim; | We propose a novel neural architecture for representing 3D surfaces, which harnesses two complementary shape representations: (i) an explicit representation via an atlas, i.e., embeddings of 2D domains into 3D (ii) an implicit-function representation, i.e., a scalar function over the 3D volume, with its levels denoting surfaces. |
452 | Learning Disentangled Representations with Latent Variation Predictability | Xinqi Zhu; Chang Xu; Dacheng Tao; | This paper defines the variation predictability of latent disentangled representations. |
453 | Deep Space-Time Video Upsampling Networks | Jaeyeon Kang; Younghyun Jo; Seoung Wug Oh; Peter Vajda; Seon Joo Kim; | In this paper, we investigate the problem of jointly upsampling videos both in space and time, which is becoming more important with advances in display systems. |
454 | Large-Scale Few-Shot Learning via Multi-Modal Knowledge Discovery | Shuo Wang; Jun Yue; Jianzhuang Liu; Qi Tian; Meng Wang; | To solve these problems, we propose a method based on multi-modal knowledge discovery. |
455 | Fast Video Object Segmentation using the Global Context Module | Yu Li; Zhuoran Shen; Ying Shan; | We developed a real-time, high-quality semi-supervised video object segmentation algorithm. |
456 | Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed Videos | Anurag Arnab; Chen Sun; Arsha Nagrani; Cordelia Schmid; | In this paper, we present a spatio-temporal action recognition model that is trained with only video-level labels, which are significantly easier to annotate. |
457 | Selecting Relevant Features from a Multi-domain Representation for Few-shot Classification | Nikita Dvornik; Cordelia Schmid; Julien Mairal; | In this work, we propose a new strategy based on feature selection, which is both simpler and more effective than previous feature adaptation approaches. |
458 | MessyTable: Instance Association in Multiple Camera Views | Zhongang Cai; Junzhe Zhang; Daxuan Ren; Cunjun Yu; Haiyu Zhao; Shuai Yi; Chai Kiat Yeo; Chen Change Loy; | We present an interesting and challenging dataset that features a large number of scenes with messy tables captured from multiple camera views. |
459 | A Unified Framework for Shot Type Classification Based on Subject Centric Lens | Anyi Rao; Jiaze Wang; Linning Xu; Xuekun Jiang; Qingqiu Huang; Bolei Zhou; Dahua Lin; | To address these issues, we propose a learning framework Subject Guidance Network (SGNet) for shot type recognition. |
460 | BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues | Samuel Albanie; Gül Varol; Liliane Momeni; Triantafyllos Afouras; Joon Son Chung; Neil Fox; Andrew Zisserman; | In this work, we introduce a new scalable approach to data collection for sign recognition in continuous videos. Finally, (3) we propose new large-scale evaluation sets for the tasks of sign recognition and sign spotting and provide baselines which we hope will serve to stimulate research in this area. |
461 | HTML: A Parametric Hand Texture Model for 3D Hand Reconstruction and Personalization | Neng Qian; Jiayi Wang; Franziska Mueller; Florian Bernard; Vladislav Golyanik; Christian Theobalt; | To fill this gap, in this work we present HTML, the first parametric texture model of human hands. |
462 | CycAs: Self-supervised Cycle Association for Learning Re-identifiable Descriptions | Zhongdao Wang; Jingwei Zhang; Liang Zheng; Yixuan Liu; Yifan Sun; Yali Li; Shengjin Wang; | This paper proposes a self-supervised learning method for the person re-identification (re-ID) problem, where existing unsupervised methods usually rely on pseudo labels, such as those from video tracklets or clustering. |
463 | Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary Instructions | Xihui Liu; Zhe Lin; Jianming Zhang; Handong Zhao; Quan Tran; Xiaogang Wang; Hongsheng Li; | We propose a novel algorithm, named Open-Edit, which is the first attempt on open-domain image manipulation with open-vocabulary instructions. |
464 | Towards Real-Time Multi-Object Tracking | Zhongdao Wang; Liang Zheng; Yixuan Liu; Yali Li; Shengjin Wang; | In this paper, we propose an MOT system that allows target detection and appearance embedding to be learned in a shared model. |
465 | A Balanced and Uncertainty-aware Approach for Partial Domain Adaptation | Jian Liang; Yunbo Wang; Dapeng Hu; Ran He; Jiashi Feng; | In this paper, we build on domain adversarial learning and propose a novel domain adaptation method BA$^3$US with two new techniques termed Balanced Adversarial Alignment (BAA) and Adaptive Uncertainty Suppression (AUS), respectively. |
466 | Unsupervised Deep Metric Learning with Transformed Attention Consistency and Contrastive Clustering Loss | Yang Li; Shichao Kan; Zhihai He; | To characterize the consistent pattern of human attention during image comparisons, we introduce the idea of transformed attention consistency. |
467 | STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos | Ali Athar; Sabarinath Mahadevan; Aljosa Osep; Laura Leal-Taixé Bastian Leibe; | In this paper, we propose a different approach that is well-suited to a variety of tasks involving instance segmentation in videos. |
468 | Hierarchical Style-based Networks for Motion Synthesis | Jingwei Xu; Huazhe Xu; Bingbing Ni; Xiaokang Yang; Xiaolong Wang; Trevor Darrell; | In this paper, we propose an unsupervised method for generating long-range, diverse and plausible behaviors to achieve a specific goal location. |
469 | Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization in the Loop | Benjamin Biggs; Oliver Boyne; James Charles; Andrew Fitzgibbon; Roberto Cipolla; | We introduce an automatic, end-to-end method for recovering the 3D pose and shape of dogs from monocular internet images. |
470 | Learning to Count in the Crowd from Limited Labeled Data | Vishwanath A. Sindagi; Rajeev Yasarla; Deepak Sam Babu; R. Venkatesh Babu; Vishal M. Patel; | In this work, we focus on reducing the annotation efforts by learning to count in the crowd from limited number of labeled samples while leveraging a large pool of unlabeled data. |
471 | SPOT: Selective Point Cloud Voting for Better Proposal in Point Cloud Object Detection | Hongyuan Du; Linjun Li; Bo Liu; Nuno Vasconcelos; | In this work, we propose Selective Point clOud voTing (SPOT) module, a simple effective component that can be easily trained end-to-end in point cloud object detectors to solve this problem. |
472 | Explainable Face Recognition | Jonathan R. Williford; Brandon B. May; Jeffrey Byrne; | In this paper, we provide the first comprehensive benchmark and baseline evaluation for XFR. Finally, we provide a comprehensive benchmark on this dataset comparing five state-of-the-art XFR algorithms on three facial matchers. |
473 | From Shadow Segmentation to Shadow Removal | Hieu Le; Dimitris Samaras; | We propose a shadow removal method that can be trained using only shadow and non-shadow patches cropped from the shadow images themselves. |
474 | Diverse and Admissible Trajectory Prediction through Multimodal Context Understanding | Seong Hyeon Park; Gyubok Lee; Jimin Seo; Manoj Bhat; Minseok Kang; Jonathan Francis; Ashwin Jadhav; Paul Pu Liang; Louis-Philippe Morency; | In this paper, we propose a model that synthesizes multiple input signals from the multimodal world|the environment’s scene context and interactions between multiple surrounding agents|to best model all diverse and admissible trajectories. |
475 | CONFIG: Controllable Neural Face Image Generation | Marek Kowalski; Stephan J. Garbin; Virginia Estellers; Tadas Baltrušaitis; Matthew Johnson; Jamie Shotton; | To this end we propose ConfigNet, a neural face model that allows for controlling individual aspects of output images in semantically meaningful ways and that is a significant step on the path towards finely-controllable neural rendering. |
476 | Single View Metrology in the Wild | Rui Zhu; Xingyi Yang; Yannick Hold-Geoffroy; Federico Perazzi; Jonathan Eisenmann; Kalyan Sunkavalli; Manmohan Chandraker; | We present a novel approach to single view metrology that can recover the absolute scale of a scene represented by 3D heights of objects or camera height above the ground as well as camera parameters of orientation and field of view, using just a monocular image acquired in unconstrained condition. |
477 | Procedure Planning in Instructional Videos | Chien-Yi Chang; De-An Huang; Danfei Xu; Ehsan Adeli; Li Fei-Fei; Juan Carlos Niebles; | In this paper, we study the problem of procedure planning in instructional videos, which can be seen as the first step towards enabling autonomous agents to plan for complex tasks in everyday settings such as cooking. |
478 | Funnel Activation for Visual Recognition | Ningning Ma; Xiangyu Zhang; Jian Sun; | We present a conceptually simple but effective funnel activation for image recognition tasks, called Funnel activation (FReLU), that extends ReLU and PReLU to a 2D activation by adding a negligible overhead of spatial condition. |
479 | GIQA: Generated Image Quality Assessment | Shuyang Gu; Jianmin Bao; Dong Chen; Fang Wen; | We introduce three GIQA algorithms from two perspectives: learning-based and data-based. |
480 | Adversarial Continual Learning | Sayna Ebrahimi; Franziska Meier; Roberto Calandra; Trevor Darrell; Marcus Rohrbach; | We show that shared features are significantly less prone to forgetting and propose a novel hybrid continual learning framework that learns a disjoint representation for task-invariant and task-specific features required to solve a sequence of tasks. |
481 | Adapting Object Detectors with Conditional Domain Normalization | Peng Su; Kun Wang; Xingyu Zeng; Shixiang Tang; Dapeng Chen; Di Qiu ; Xiaogang Wang; | In this work, we present the Conditional Domain Normalization (CDN) to bridge the domain distribution gap. |
482 | HARD-Net: Hardness-AwaRe Discrimination Network for 3D Early Activity Prediction | Tianjiao Li; Jun Liu; Wei Zhang; Lingyu Duan; | In this paper, we propose a novel Hardness-AwaRe Discrimination Network (HARD-Net) to specifically investigate the relationships between the similar activity pairs that are hard to be discriminated. |
483 | Pseudo RGB-D for Self-Improving Monocular SLAM and Depth Prediction | Lokender Tiwari; Pan Ji; Quoc-Huy Tran; Bingbing Zhuang; Saket Anand ; Manmohan Chandraker; | In this paper, we demonstrate that the coupling of these two by leveraging the strengths of each mitigates the other’s shortcomings. |
484 | Interpretable and Generalizable Person Re-Identification with Query-Adaptive Convolution and Temporal Lifting | Shengcai Liao; Ling Shao; | In this paper, beyond representation learning, we consider how to formulate person image matching directly in deep feature maps. |
485 | Self-supervised Bayesian Deep Learning for Image Recovery with Applications to Compressive Sensing | Tongyao Pang; Yuhui Quan; Hui Ji; | Motivated by the practical value of reducing the cost and complexity of constructing labeled training datasets, this paper proposed a self-supervised deep learning approach for image recovery, which is dataset-free. |
486 | Graph-PCNN: Two Stage Human Pose Estimation with Graph Pose Refinement | Jian Wang; Xiang Long; Yuan Gao; Errui Ding; Shilei Wen; | In this paper, we aim to find a better approach to get more accurate localization results. |
487 | Semi-supervised Learning with a Teacher-student Network for Generalized Attribute Prediction | Minchul Shin; | With that in mind, we propose a multi-teacher-single-student (MTSS) approach inspired by the multi-task learning and the distillation of semi-supervised learning. |
488 | Unsupervised Domain Adaptation with Noise Resistible Mutual-Training for Person Re-identification | Fang Zhao; Shengcai Liao; Guo-Sen Xie; Jian Zhao; Kaihao Zhang; Ling Shao; | To depress noises in pseudo-labels, this paper proposes a Noise Resistible Mutual-Training (NRMT) method, which maintains two networks during training to perform collaborative clustering and mutual instance selection. |
489 | DPDist: Comparing Point Clouds Using Deep Point Cloud Distance | Dahlia Urbach; Yizhak Ben-Shabat; Michael Lindenbaum; | We introduce a new deep learning method for point cloud comparison. |
490 | Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation | Xiaokang Chen; Kwan-Yee Lin; Jingbo Wang; Wayne Wu; Chen Qian; Hongsheng Li; Gang Zeng; | In this paper, we propose a unified and efficient Cross-modality Guided Encoder to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively. |
491 | DataMix: Efficient Privacy-Preserving Edge-Cloud Inference | Zhijian Liu; Zhanghao Wu; Chuang Gan; Ligeng Zhu; Song Han; | In this paper, we mediate between the resource-constrained edge devices and the privacy-invasive cloud servers by introducing a novel privacy-preserving edge-cloud inference framework, DataMix. |
492 | Neural Re-Rendering of Humans from a Single Image | Kripasindhu Sarkar; Dushyant Mehta; Weipeng Xu; Vladislav Golyanik; Christian Theobalt; | To ad-dress these challenges, we propose a new method for neural re-renderingof a human under a novel user-defined pose and viewpoint given oneinput image. |
493 | Reversing the cycle: self-supervised deep stereo through enhanced monocular distillation | Filippo Aleotti; Fabio Tosi; Li Zhang; Matteo Poggi; Stefano Mattoccia; | In contrast, to soften typical stereo artefacts, we propose a novel self-supervised paradigm reversing the link between the two. |
494 | PIPAL: a Large-Scale Image Quality Assessment Dataset for Perceptual Image Restoration | Jinjin Gu; Haoming Cai; Haoyu Chen; Xiaoxing Ye; Jimmy S. Ren; Chao Dong; | Based on PIPAL, we present new benchmarks for both IQA and super-resolution methods. |
495 | Why do These Match? Explaining the Behavior of Image Similarity Models | Bryan A. Plummer; Mariya I. Vasileva; Vitali Petsiuk; Kate Saenko; David Forsyth; | In this paper, we introduce Salient Attributes for Network Explanation (SANE) to explain image similarity models, where a model’s output is a score measuring the similarity of two inputs rather than a classification score. |
496 | CooGAN: A Memory-Efficient Framework for High-Resolution Facial Attribute Editing | Xuanhong Chen; Bingbing Ni; Naiyuan Liu; Ziang Liu; Yiliu Jiang; Loc Truong; Qi Tian; | To address these issues, we propose a NOVEL pixel translation framework called Cooperative GAN(CooGAN) for HR facial image editing. |
497 | Progressive Transformers for End-to-End Sign Language Production | Ben Saunders; Necati Cihan Camgoz; Richard Bowden; | In this paper, we propose Progressive Transformers, the first SLP model to translate from discrete spoken language sentences to continuous 3D sign pose sequences in an end-to-end manner. |
498 | Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting | Minghui Liao; Guan Pang; Jing Huang; Tal Hassner; Xiang Bai; | To tackle these problems, we propose Mask TextSpotter v3, an end-to-end trainable scene text spotter that adopts a Segmentation Proposal Network (SPN) instead of an RPN. |
499 | Making Affine Correspondences Work in Camera Geometry Computation | Daniel Barath; Michal Polic; Wolfgang Förstner; Torsten Sattler; Tomas Pajdla; Zuzana Kukelova; | We propose a method for refining the local feature geometries by symmetric intensity-based matching, combine uncertainty propagation inside RANSAC with preemptive model verification, show a general scheme for computing uncertainty of minimal solvers results, and adapt the sample cheirality check for homography estimation to region-to-region correspondences. |
500 | Sub-center ArcFace: Boosting Face Recognition by Large-scale Noisy Web Faces | Jiankang Deng; Jia Guo; Tongliang Liu; Mingming Gong; Stefanos Zafeiriou; | In this paper, we relax the intra-class constraint of ArcFace to improve the robustness to label noise. |
501 | Foley Music: Learning to Generate Music from Videos | Chuang Gan; Deng Huang; Peihao Chen; Joshua B. Tenenbaum; Antonio Torralba; | In this paper, we introduce Foley Music, a system that can synthesize plausible music for a silent video clip about people playing musical instruments. |
502 | Contrastive Multiview Coding | Yonglong Tian; Dilip Krishnan; Phillip Isola; | We study this hypothesis under the framework of multiview contrastive learning, where we learn a representation that aims to maximize mutual information between different views of the same scene but is otherwise compact. |
503 | Regional Homogeneity: Towards Learning Transferable Universal Adversarial Perturbations Against Defenses | Yingwei Li; Song Bai; Cihang Xie; Zhenyu Liao; Xiaohui Shen; Alan Yuille; | This paper focuses on learning transferable adversarial examples specifically against defense models (models to defense adversarial attacks). |
504 | Generative Low-bitwidth Data Free Quantization | Shoukai Xu; Haokun Li; Bohan Zhuang; Jing Liu; Jiezhang Cao; Chuangrun Liang; Mingkui Tan; | In this paper, we investigate a simple-yet-effective method called Generative Low-bitwidth Data Free Quantization(GDFQ) to remove the data dependence burden. |
505 | Local Correlation Consistency for Knowledge Distillation | Xiaojie Li; Jianlong Wu; Hongyu Fang; Yue Liao; Fei Wang; Chen Qian; | In this paper, we propose the local correlation exploration framework for knowledge distillation. |
506 | Perceiving 3D Human-Object Spatial Arrangements from a Single Image in the Wild | Jason Y. Zhang; Sam Pepose; Hanbyul Joo; Deva Ramanan; Jitendra Malik; Angjoo Kanazawa; | We present a method that infers spatial arrangements and shapes of humans and objects in a globally consistent 3D scene, all from a single image in-the-wild captured in an uncontrolled environment. |
507 | Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation | Hang Zhou; Xudong Xu; Dahua Lin; Xiaogang Wang; Ziwei Liu; | To overcome this challenge, we propose to leverage the vastly available mono data to facilitate the generation of stereophonic audio. |
508 | CelebA-Spoof: Large-Scale Face Anti-Spoofing Dataset with Rich Annotations | Yuanhan Zhang; ZhenFei Yin; Yidong Li; Guojun Yin; Junjie Yan; Jing Shao; Ziwei Liu; | Our key insight is that, compared with the commonly-used binary supervision or mid-level geometric representations, rich semantic annotations as auxiliary tasks can greatly boost the performance and generalizability of face anti-spoofing across a wide range of spoof attacks. |
509 | Thinking in Frequency: Face Forgery Detection by Mining Frequency-aware Clues | Yuyang Qian; Guojun Yin; Lu Sheng; Zixuan Chen; Jing Shao; | To introduce frequency into the face forgery detection, we propose a novel Frequency in Face Forgery Network (F$^3$-Net), taking advantages of two different but complementary frequency-aware clues, 1) frequency-aware decomposed image components, and 2) local frequency statistics, to deeply mine the forgery patterns via our two-stream collaborative learning framework. |
510 | Weakly-Supervised Cell Tracking via Backward-and-Forward Propagation | Kazuya Nishimura; Junya Hayashida; Chenyang Wang; Dai Fei Elmer Ker; Ryoma Bise; | We propose a weakly-supervised cell tracking method that can train a convolutional neural network (CNN) by using only the annotation of ""cell detection"" (i.e., the coordinates of cell positions) without association information, in which cell positions can be easily obtained by nuclear staining. |
511 | SeqHAND: RGB-Sequence-Based 3D Hand Pose and Shape Estimation | John Yang; Hyung Jin Chang; Seungeui Lee; Nojun Kwak; | In this paper, we attempt to not only consider the appearance of a hand but incorporate the temporal movement information of a hand in motion into the learning framework for better 3D hand pose estimation performance, which leads to the necessity of a large scale dataset with sequential RGB hand images. |
512 | Rethinking the Distribution Gap of Person Re-identification with Camera-based Batch Normalization | Zijie Zhuang; Longhui Wei; Lingxi Xie; Tianyu Zhang; Hengheng Zhang ; Haozhe Wu; Haizhou Ai; Qi Tian; | This paper rethinks the working mechanism of conventional ReID approaches and puts forward a new solution. |
513 | AMLN: Adversarial-based Mutual Learning Network for Online Knowledge Distillation | Xiaobing Zhang; Shijian Lu; Haigang Gong; Zhipeng Luo; Ming Liu; | In this work, we propose an innovative adversarial-based mutual learning network (AMLN) that introduces process-driven learning beyond outcome-driven learning for augmented online knowledge distillation. |
514 | Online Multi-modal Person Search in Videos | Jiangyue Xia; Anyi Rao; Qingqiu Huang; Linning Xu; Jiangtao Wen; Dahua Lin; | In this paper, we propose an online person search framework, which can recognize people in a video on the fly. |
515 | Single Image Super-Resolution via a Holistic Attention Network | Ben Niu; Weilei Wen; Wenqi Ren; Xiangde Zhang; Lianping Yang; Shuzhen Wang; Kaihao Zhang; Xiaochun Cao; Haifeng Shen; | To address this problem, we propose a new holistic attention network (HAN), which consists of a layer attention module (LAM) and a channel-spatial attention module (CSAM), to model the holistic interdependencies among layers, channels, and positions. |
516 | Can You Read Me Now? Content Aware Rectification using Angle Supervision | Amir Markovitz; Inbal Lavi; Or Perel; Shai Mazor; Roee Litman; | We present CREASE: Content Aware Rectification using Angle Supervision, the first learned method for document rectification that relies on the document’s content, the location of the words and specifically their orientation, as hints to assist in the rectification process. |
517 | Momentum Batch Normalization for Deep Learning with Small Batch Size | Hongwei Yong; Jianqiang Huang; Deyu Meng; Xiansheng Hua; Lei Zhang; | To make a deeper understanding of BN, in this work we prove that BN actually introduces a certain level of noise into the sample mean and variance during the training process, while the noise level depends only on the batch size. |
518 | AdvPC: Transferable Adversarial Perturbations on 3D Point Clouds | Abdullah Hamdi; Sara Rojas; Ali Thabet; Bernard Ghanem; | In this work, we present novel data-driven adversarial attacks against 3D point cloud networks. |
519 | Edge-aware Graph Representation Learning and Reasoning for Face Parsing | Gusi Te; Yinglu Liu; Wei Hu; Hailin Shi; Tao Mei; | To this end, we propose to model and reason the region-wise relations by learning graph representations, and leverage the edge information between regions for optimized abstraction. |
520 | BBS-Net: RGB-D Salient Object Detection with a Bifurcated Backbone Strategy Network | Deng-Ping Fan; Yingjie Zhai; Ali Borji; Jufeng Yang; Ling Shao; | In this paper, we make the first attempt to leverage the inherent multi-modal and multi-level nature of RGB-D salient object detection to develop a novel cascaded refinement network. |
521 | G-LBM:Generative Low-dimensional Background Model Estimation from Video Sequences | Behnaz Rezaei; Amirreza Farnoosh; Sarah Ostadabbas; | In this paper, we propose a computationally tractable and theoretically supported non-linear low-dimensional generative model to represent real-world data in the presence of noise and sparse outliers. |
522 | H3DNet: 3D Object Detection Using Hybrid Geometric Primitives | Zaiwei Zhang; Bo Sun; Haitao Yang; Qixing Huang; | We introduce H3DNet, which takes a colorless 3D point cloud as input and outputs a collection of oriented object bounding boxes (or BB) and their semantic labels. |
523 | Expressive Telepresence via Modular Codec Avatars | Hang Chu; Shugao Ma; Fernando De la Torre; Sanja Fidler; Yaser Sheikh; | This paper aims in this direction and presents Modular Codec Avatars (MCA), a method to generate hyper-realistic faces driven by the cameras in the VR headset. |
524 | Cascade Graph Neural Networks for RGB-D Salient Object Detection | Ao Luo; Xin Li; Fan Yang; Zhicheng Jiao; Hong Cheng; Siwei Lyu; | In this paper, we study the problem of salient object detection for RGB-D images by using both color and depth information. |
525 | FairALM: Augmented Lagrangian Method for Training Fair Models with Little Regret | Vishnu Suresh Lokhande; Aditya Kumar Akash; Sathya N. Ravi; Vikas Singh; | Here, we study mechanisms that impose fairness concurrently while training the model. |
526 | Generating Videos of Zero-Shot Compositions of Actions and Objects | Megha Nawhal; Mengyao Zhai; Andreas Lehrmann; Leonid Sigal; Greg Mori; | In this paper we develop methods for generating such videos — making progress toward addressing the important, open problem of video generation in complex scenes. |
527 | ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language | Zhe Wang; Zhiyuan Fang; Jun Wang; Yezhou Yang; | To be concrete, our Visual-Textual Attribute Alignment model (dubbed as ViTAA) learns to disentangle the feature space of a person into sub-spaces corresponding to attributes using a light auxiliary attribute segmentation layer. It then aligns these visual features with the textual attributes parsed from the sentences via a novel contrastive learning loss. |
528 | Renovating Parsing R-CNN for Accurate Multiple Human Parsing | Lu Yang; Qing Song; Zhihui Wang; Mengjie Hu; Chun Liu; Xueshi Xin; Wenhe Jia; Songcen Xu; | To reverse this phenomenon, we present Renovating Parsing R-CNN (RP R-CNN), which introduces a global semantic enhanced feature pyramid network and a parsing re-scoring network into the existing high-performance pipeline. |
529 | Multi-Task Curriculum Framework for Open-Set Semi-Supervised Learning | Qing Yu; Daiki Ikami; Go Irie; Kiyoharu Aizawa; | Instead of training an OOD detector and SSL separately, we propose a multi-task curriculum learning framework. |
530 | Gradient-Induced Co-Saliency Detection | Zhao Zhang; Wenda Jin; Jun Xu; Ming-Ming Cheng; | In this paper, inspired by human behavior, we propose a gradient-induced co-saliency detection (GICD) method. To evaluate the performance of Co-SOD methods on discovering the co-salient object among multiple foregrounds, we construct a challenging CoCA dataset, where each image contains at least one extraneous foreground along with the co-salient object. |
531 | Nighttime Defogging Using High-Low Frequency Decomposition and Grayscale-Color Networks | Wending Yan; Robby T. Tan; Dengxin Dai; | In this paper, we address the problem of nighttime defogging from a single image. |
532 | SegFix: Model-Agnostic Boundary Refinement for Segmentation | Yuhui Yuan; Jingyi Xie; Xilin Chen; Jingdong Wang; | We present a model-agnostic post-processing scheme to improve the boundary quality for the segmentation result that is generated by any existing segmentation model. |
533 | Spatio-Temporal Graph Transformer Networks for Pedestrian Trajectory Prediction | Cunjun Yu; Xiao Ma; Jiawei Ren; Haiyu Zhao; Shuai Yi; | In this paper, we present STAR, a Spatio-Temporal grAph tRansformer framework, which tackles trajectory prediction by only attention mechanisms. |
534 | Fast Bi-layer Neural Synthesis of One-Shot Realistic Head Avatars | Egor Zakharov; Aleksei Ivakhnenko; Aliaksandra Shysheya; Victor Lempitsky; | We propose a neural rendering-based system that creates head avatars from a single photograph. |
535 | Neural Geometric Parser for Single Image Camera Calibration | Jinwoo Lee; Minhyuk Sung; Hyunjoon Lee; Junho Kim; | We propose a neural geometric parser learning single image camera calibration for man-made scenes. |
536 | Learning Flow-based Feature Warping for Face Frontalization with Illumination Inconsistent Supervision | Yuxiang Wei; Ming Liu; Haolin Wang; Ruifeng Zhu; Guosheng Hu; Wangmeng Zuo; | We propose a novel Flow-based Feature Warping Model (FFWM) which can learn to synthesize photo-realistic and illumination preserving frontal images with illumination inconsistent supervision. |
537 | Learning Architectures for Binary Networks | Dahyun Kim; Kunal Pratap Singh; Jonghyun Choi; | Questioning that the architectures designed for FP networks might not be the best for binary networks, we propose to search architectures for binary networks (BNAS) by defining a new search space for binary architectures and a novel search objective. |
538 | Semantic View Synthesis | Hsin-Ping Huang; Hung-Yu Tseng; Hsin-Ying Lee; Jia-Bin Huang; | To address the drawbacks, we propose a two-step approach. First, we focus on synthesizing the color and depth of the visible surface of the 3D scene. We then use the synthesized color and depth to impose explicit constraints on the multiple-plane image (MPI) representation prediction process. |
539 | An Analysis of Sketched IRLS for Accelerated Sparse Residual Regression | Daichi Iwata; Michael Waechter; Wen-Yan Lin; Yasuyuki Matsushita; | This paper studies the problem of sparse residual regression, i.e., learning a linear model using a norm that favors solutions in which the residuals are sparsely distributed. |
540 | Relative Pose from Deep Learned Depth and a Single Affine Correspondence | Ivan Eichhardt; Daniel Barath; | We propose a new approach for combining deep-learned nonmetric monocular depth with affine correspondences (ACs) to estimate the relative pose of two calibrated cameras from a single correspondence. |
541 | Video Super-Resolution with Recurrent Structure-Detail Network | Takashi Isobe; Xu Jia; Shuhang Gu; Songjiang Li; Shengjin Wang; Qi Tian; | In this work, we propose a novel recurrent video super-resolution method which is both effective and efficient in exploiting previous frames to super-resolve the current frame. |
542 | Shape Adaptor: A Learnable Resizing Module | Shikun Liu; Zhe Lin; Yilin Wang; Jianming Zhang; Federico Perazzi; Edward Johns; | We present a novel resizing module for neural networks: shape adaptor, a drop-in enhancement built on top of traditional resizing layers, such as pooling, bilinear sampling, and strided convolution. |
543 | Shuffle and Attend: Video Domain Adaptation | Jinwoo Choi; Gaurav Sharma; Samuel Schulter; Jia-Bin Huang; | We address the problem of domain adaptation in videos for the task of human action recognition. |
544 | DRG: Dual Relation Graph for Human-Object Interaction Detection | Chen Gao; Jiarui Xu; Yuliang Zou; Jia-Bin Huang; | In this paper, we leverage an abstract spatial-semantic representation to describe each human-object pair and aggregate the contextual information of the scene via a dual relation graph (one human-centric and one object-centric). |
545 | Flow-edge Guided Video Completion | Chen Gao; Ayush Saraf; Jia-Bin Huang; Johannes Kopf; | We present a new flow-based video completion algorithm. |
546 | End-to-End Trainable Deep Active Contour Models for Automated Image Segmentation: Delineating Buildings in Aerial Imagery | Ali Hatamizadeh; Debleena Sengupta; Demetri Terzopoulos; | As a solution, we present Trainable Deep Active Contours (TDACs), an automatic image segmentation framework that intimately unites Convolutional Neural Networks (CNNs) and Active Contour Models (ACMs). |
547 | Towards End-to-end Video-based Eye-Tracking | Seonwook Park; Emre Aksan; Xucong Zhang; Otmar Hilliges; | In response to this understanding, we propose a novel dataset and accompanying method which aims to explicitly learn these semantic and temporal relationships. |
548 | Generating Handwriting via Decoupled Style Descriptors | Atsunobu Kotani; Stefanie Tellex; James Tompkin; | Instead, we introduce the Decoupled Style Descriptor (DSD) model for handwriting, which factors both character- and writer-level styles and allows our model to represent an overall greater space of styles. |
549 | LEED: Label-Free Expression Editing via Disentanglement | Rongliang Wu; Shijian Lu; | This paper presents an innovative label-free expression editing via disentanglement (LEED) framework that is capable of editing the expression of both frontal and profile facial images without requiring any expression labels. |
550 | Fashion Captioning: Towards Generating Accurate Descriptions with Semantic Rewards | Xuewen Yang; Heming Zhang; Di Jin; Yingru Liu; Chi-Hao Wu; Jianchao Tan; Dongliang Xie; Jue Wang; Xin Wang; | The goal of this work is to develop a novel learning framework for accurate and expressive fashion captioning. |
551 | Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder | Gouthaman KV; Anurag Mittal; | In this work, we propose a novel model-agnostic question encoder, Visually-Grounded Question Encoder (VGQE), for VQA that reduces this effect. |
552 | Unsupervised Cross-Modal Alignment for Multi-Person 3D Pose Estimation | Jogendra Nath Kundu; Ambareesh Revanur; Govind Vitthal Waghmare; Rahul Mysore Venkatesh; R. Venkatesh Babu; | We present a deployment friendly, fast bottom-up framework for multi-person 3D human pose estimation. |
553 | Class-Incremental Domain Adaptation | Jogendra Nath Kundu; Rahul Mysore Venkatesh; Naveen Venkat; Ambareesh Revanur; R. Venkatesh Babu; | In this work, we effectively identify the limitations of these approaches in the CIDA paradigm. |
554 | Anti-Bandit Neural Architecture Search for Model Defense | Hanlin Chen; Baochang Zhang; Song Xue; Xuan Gong; Hong Liu; Rongrong Ji; David Doermann; | In this paper, we defend against adversarial attacks using neural architecture search (NAS) which is based on a comprehensive search of denoising blocks, weight-free operations, Gabor filters and convolutions. |
555 | Wavelet-Based Dual-Branch Network for Image Demoiréing | Lin Liu; Jianzhuang Liu; Shanxin Yuan; Gregory Slabaugh; Aleš Leonardis; Wengang Zhou; Qi Tian; | In this paper, we design a wavelet-based dual-branch network (WDNet) with a spatial attention mechanism for image demoireing. |
556 | Low Light Video Enhancement using Synthetic Data Produced with an Intermediate Domain Mapping | Danai Triantafyllidou; Sean Moran; Steven McDonagh; Sarah Parisot; Gregory Slabaugh; | By generating dynamic video data synthetically, we enable a recently proposed state-of-the-art RAW-to-RGB model to attain higher image quality (improved colour, reduced artifacts) and improved temporal consistency, compared to the same model trained with only static real video data |
557 | Non-Local Spatial Propagation Network for Depth Completion | Jinsun Park; Kyungdon Joo; Zhe Hu; Chi-Kuei Liu; In So Kweon; | In this paper, we propose a robust and efficient end-to-end non-local spatial propagation network for depth completion. |
558 | DanbooRegion: An Illustration Region Dataset | Lvmin Zhang; Yi JI; Chunping Liu; | We detail the challenges in achieving this dataset and present a human-in-the-loop workflow namely Feasibility-based Assignment Recommendation (FAR) to enable large-scale annotating. |
559 | Event Enhanced High-Quality Image Recovery | Bishan Wang; Jingwei He; Lei Yu; Gui-Song Xia; Wen Yang; | Based on this, we propose an explainable network, an event-enhanced sparse learning network (eSL-Net), to recover the high-quality images from event cameras. |
560 | PackDet: Packed Long-Head Object Detector | Kun Ding; Guojin He; Huxiang Gu; Zisha Zhong; Shiming Xiang; Chunhong Pan; | To solve this issue, we propose a packing operator (PackOp) to combine all head branches together at spatial. |
561 | A Generic Graph-based Neural Architecture Encoding Scheme for Predictor-based NAS | Xuefei Ning; Yin Zheng; Tianchen Zhao; Yu Wang; Huazhong Yang; | This work proposes a novel Graph-based neural ArchiTecture Encoding Scheme, a.k.a. GATES, to improve the predictor-based neural architecture search. |
562 | Learning Semantic Neural Tree for Human Parsing | Ruyi Ji; Dawei Du; Libo Zhang; Longyin Wen; Yanjun Wu; Chen Zhao; Feiyue Huang; Siwei Lyu; | In this paper, we design a novel semantic neural tree for human parsing, which uses a tree architecture to encode physiological structure of human body, and design a coarse to fine process in a cascade manner to generate accurate results. |
563 | Sketching Image Gist: Human-Mimetic Hierarchical Scene Graph Generation | Wenbin Wang; Ruiping Wang; Shiguang Shan; Xilin Chen; | Therefore, we argue that a desirable scene graph should be also hierarchically constructed, and introduce a new scheme for modeling scene graph. |
564 | Burst Denoising via Temporally Shifted Wavelet Transforms | Xuejian Rong; Denis Demandolx; Kevin Matzen; Priyam Chatterjee; Yingli Tian; | We propose an end-to-end trainable burst denoising pipeline which jointly captures high-resolution and high-frequency deep features derived from wavelet transforms. |
565 | JSSR: A Joint Synthesis, Segmentation, and Registration System for 3D Multi-Modal Image Alignment of Large-scale Pathological CT Scans | Fengze Liu; Jinzheng Cai; Yuankai Huo; Chi-Tung Cheng; Ashwin Raju; Dakai Jin; Jing Xiao; Alan Yuille; Le Lu; ChienHung Liao; Adam P. Harrison; | In this work, we propose a novel multi-task learning system, JSSR, based on an end-to-end 3D convolutional neural network that is composed of a generator, a registration and a segmentation component. |
566 | SimAug: Learning Robust Representations from Simulation for Trajectory Prediction | Junwei Liang; Lu Jiang; Alexander Hauptmann; | We propose a novel approach to learn robust representation through augmenting the simulation training data such that the representation can better generalize to unseen real-world test data. |
567 | ScribbleBox: Interactive Annotation Framework for Video Object Segmentation | Bowen Chen; Huan Ling; Xiaohui Zeng; Jun Gao; Ziyue Xu; Sanja Fidler; | We introduce ScribbleBox, an interactive framework for annotating object instances with masks in videos with a significant boost in efficiency. |
568 | Rethinking Pseudo-LiDAR Representation | Xinzhu Ma; Shinan Liu; Zhiyi Xia; Hongwen Zhang; Xingyu Zeng; Wanli Ouyang; | In this paper, we perform an in-depth investigation and observe that the pseudo-LiDAR representation is effective because of the coordinate transformation, instead of data representation itself. |
569 | Deep Multi Depth Panoramas for View Synthesis | Kai-En Lin; Zexiang Xu; Ben Mildenhall; Pratul P. Srinivasan; Yannick Hold-Geoffroy; Stephen DiVerdi; Qi Sun; Kalyan Sunkavalli; Ravi Ramamoorthi; | We propose a learning-based approach for novel view synthesis for multi-camera 360$^ |
570 | MINI-Net: Multiple Instance Ranking Network for Video Highlight Detection | Fa-Ting Hong; Xuanteng Huang; Wei-Hong Li; Wei-Shi Zheng; | In this work, we propose casting weakly supervised video highlight detection modeling for a given specific event as a multiple instance ranking network (MINI-Net) learning. |
571 | ContactPose: A Dataset of Grasps with Object Contact and Hand Pose | Samarth Brahmbhatt; Chengcheng Tang; Christopher D. Twigg; Charles C. Kemp; James Hays; | We introduce ContactPose, the first dataset of hand-object contact paired with hand pose, object pose, and RGB-D images. |
572 | API-Net: Robust Generative Classifier via a Single Discriminator | Xinshuai Dong; Hong Liu; Rongrong Ji; Liujuan Cao; Qixiang Ye; Jianzhuang Liu; Qi Tian; | This work aims for a solution of generative classifiers that can profit from the merits of both. |
573 | Bias-based Universal Adversarial Patch Attack for Automatic Check-out | Aishan Liu; Jiakai Wang; Xianglong Liu; Bowen Cao; Chongzhi Zhang; Hang Yu; | To address the problem, this paper proposes a bias-based framework to generate class-agnostic universal adversarial patches with strong generalization ability, which exploits both the perceptual and semantic bias of models. |
574 | Imbalanced Continual Learning with Partitioning Reservoir Sampling | Chris Dongjoo Kim; Jinseo Jeong; Gunhee Kim; | We jointly address the two independently solved problems, Catastropic Forgetting and the long-tailed label distribution by ?rst empirically showing a new challenge of destructive forgetting of the minority concepts on the tail. |
575 | Guided Collaborative Training for Pixel-wise Semi-Supervised Learning | Zhanghan Ke; Di Qiu; Kaican Li; Qiong Yan; Rynson W.H. Lau; | In this paper, we present a new SSL framework, named Guided Collaborative Training (GCT), for pixel-wise tasks, with two main technical contributions. |
576 | Stacking Networks Dynamically for Image Restoration Based on the Plug-and-Play Framework | Haixin Wang; Tianhao Zhang; Muzhi Yu; Jinan Sun; Wei Ye; Chen Wang ; Shikun Zhang; | To address this challenge, we leverage the iterative process of the traditional plug-and-play method to provide a dynamic stacked network for Image Restoration. |
577 | Efficient Transfer Learning via Joint Adaptation of Network Architecture and Weight | Ming Sun; Haoxuan Dou; Junjie Yan; | To remedy the above issues, we reduce the super-network size by randomly dropping connection between network blocks while embedding a larger search space. |
578 | Spatial Attention Pyramid Network for Unsupervised Domain Adaptation | Congcong Li; Dawei Du; Libo Zhang; Longyin Wen; Tiejian Luo; Yanjun Wu; Pengfei Zhu; | To that end, in this paper, we design a new spatial attention pyramid network for unsupervised domain adaptation. |
579 | GSIR: Generalizable 3D Shape Interpretation and Reconstruction | Jianren Wang; Zhaoyuan Fang; | We propose to recover 3D shape structures as cuboids from partially reconstructed objects and use the predicted structures to further guide 3D reconstruction. |
580 | Weakly Supervised 3D Object Detection from Lidar Point Cloud | Qinghao Meng; Wenguan Wang; Tianfei Zhou; Jianbing Shen; Luc Van Gool ; Dengxin Dai; | This work proposes a weakly supervised approach for 3D object detection, only requiring a small set of weakly annotated scenes, associated with a few precisely labeled object instances. |
581 | Two-phase Pseudo Label Densification for Self-training based Domain Adaptation | Inkyu Shin; Sanghyun Woo; Fei Pan; In So Kweon; | In order to tackle this problem, we propose a novel Two-phase Pseudo Label Densification framework, referred to as TPLD. |
582 | Adaptive Offline Quintuplet Loss for Image-Text Matching | Tianlang Chen; Jiajun Deng; Jiebo Luo; | In this paper, we propose solutions by sampling negatives offline from the whole training set. |
583 | Learning Object Placement by Inpainting for Compositional Data Augmentation | Lingzhi Zhang; Tarmily Wen; Jie Min; Jiancong Wang; David Han; Jianbo Shi; | We propose a self-learning framework that automatically generates the necessary training data without any manual labeling by detecting, cutting, and inpainting objects from an image. |
584 | Deep Vectorization of Technical Drawings | Vage Egiazarian; Oleg Voynov; Alexey Artemov; Denis Volkhonskiy; Aleksandr Safin; Maria Taktasheva; Denis Zorin; Evgeny Burnaev; | We present a new method for vectorization of technical line drawings, such as floor plans, architectural drawings, and 2D CAD images. |
585 | CAD-Deform: Deformable Fitting of CAD Models to 3D Scans | Vladislav Ishimtsev; Alexey Bokhovkin; Alexey Artemov; Savva Ignatyev ; Matthias Niessner; Denis Zorin; Evgeny Burnaev; | In this work, we address this shortcoming by introducing CAD-Deform, a method which obtains more accurate CAD-to-scan fits by non-rigidly deforming retrieved CAD models. |
586 | An Image Enhancing Pattern-based Sparsity for Real-time Inference on Mobile Devices | Xiaolong Ma; Wei Niu; Tianyun Zhang; Sijia Liu; Sheng Lin; Hongjia Li; Wujie Wen; Xiang Chen; Jian Tang; Kaisheng Ma; Bin Ren; Yanzhi Wang; | To solve the problem, we introduce a new sparsity dimension, namely pattern-based sparsity that comprises pattern and connectivity sparsity, and becoming both highly accurate and hardware friendly. |
587 | AutoTrajectory: Label-free Trajectory Extraction and Prediction from Videos using Dynamic Points | Yuexin Ma; Xinge Zhu; Xinjing Cheng; Ruigang Yang; Jiming Liu; Dinesh Manocha; | In this paper, we present a novel, label-free algorithm, AutoTrajectory, for trajectory extraction and prediction to use raw videos directly. |
588 | Multi-Agent Embodied Question Answering in Interactive Environments | Sinan Tan; Weilai Xiang; Huaping Liu; Di Guo; Fuchun Sun; | We investigate a new AI task — Multi-Agent Interactive Question Answering — where several agents explore the scene jointly in interactive environments to answer a question. |
589 | Conditional Sequential Modulation for Efficient Global Image Retouching | Jingwen He; Yihao Liu; Yu Qiao; Chao Dong; | In this paper, we investigate some commonly-used retouching operations and mathematically find that these pixel-independent operations can be approximated or formulated by multi-layer perceptrons (MLPs). |
590 | Segmenting Transparent Objects in the Wild | Enze Xie; Wenjia Wang; Wenhai Wang; Mingyu Ding; Chunhua Shen; Ping Luo; | To address this important problem, this work proposes a large-scale dataset for transparent object segmentation, named Trans10K, consisting of 10,428 images of real scenarios with carefully manual annotations, which are 10 times larger than the existing datasets. |
591 | Length-Controllable Image Captioning | Chaorui Deng; Ning Ding; Mingkui Tan; Qi Wu; | In this paper, we propose to use a simple length level embedding to endow them with this ability. |
592 | Few-Shot Semantic Segmentation with Democratic Attention Networks | Haochen Wang; Xudong Zhang; Yutao Hu; Yandan Yang; Xianbin Cao; Xiantong Zhen; | In this paper, we propose the Democratic Attention Network (DAN) for few-shot semantic segmentation. |
593 | Defocus Blur Detection via Depth Distillation | Xiaodong Cun; Chi-Man Pun; | To solve these problems, we introduce depth information into DBD for the first time. |
594 | Motion Guided 3D Pose Estimation from Videos | Jingbo Wang; Sijie Yan; Yuanjun Xiong; Dahua Lin; | We propose a new loss function, called motion loss, for the problem of monocular 3D Human pose estimation from 2D pose. |
595 | Reflection Separation via Multi-bounce Polarization State Tracing | Rui Li; Simeng Qiu; Guangming Zang; Wolfgang Heidrich; | In this paper we aim to generalize the reflection removal to real-world scenarios with more complicated light interactions. |
596 | SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation | Jiale Cao; Rao Muhammad Anwer; Hisham Cholakkal; Fahad Shahbaz Khan; Yanwei Pang; Ling Shao; | We propose a fast single-stage instance segmentation method, called SipMask, that preserves instance-specific spatial information by separating mask prediction of an instance to different sub-regions of a detected bounding-box. |
597 | SemanticAdv: Generating Adversarial Examples via Attribute-conditioned Image Editing | Haonan Qiu; Chaowei Xiao; Lei Yang; Xinchen Yan; Honglak Lee; Bo Li; | In this paper, we propose SemanticAdv to generate a new type of semantically realistic adversarial examples via attribute-conditioned image editing. |
598 | Learning with Noisy Class Labels for Instance Segmentation | Longrong Yang; Fanman Meng; Hongliang Li; Qingbo Wu; Qishang Cheng; | To solve this issue, a novel method is proposed in this paper, which uses different losses describing different roles of noisy class labels to enhance the learning. |
599 | Deep Image Clustering with Category-Style Representation | Junjie Zhao; Donghuan Lu; Kai Ma; Yu Zhang; Yefeng Zheng; | In this paper, we propose a novel deep image clustering framework to learn a category-style latent representation in which the category information is disentangled from image style and can be directly used as the cluster assignment. |
600 | Self-supervised Motion Representation via Scattering Local Motion Cues | Yuan Tian; Zhaohui Che; Wenbo Bao; Guangtao Zhai; Zhiyong Gao; | In this paper, we leverage the massive unlabeled video data to learn an accurate explicit motion representation that aligns well with the semantic distribution of the moving objects. |
601 | Improving Monocular Depth Estimation by Leveraging Structural Awareness and Complementary Datasets | Tian Chen; Shijie An; Yuan Zhang; Chongyang Ma ; Huayan Wang; Xiaoyan Guo; Wen Zheng; | One key limitation of existing approaches lies in their lack of structural information exploitation, which leads to inaccurate spatial layout, discontinuous surface, and ambiguous boundaries. In this paper, we tackle this problem in three aspects. |
602 | BMBC: Bilateral Motion Estimation with Bilateral Cost Volume for Video Interpolation | Junheum Park; Keunsoo Ko; Chul Lee; Chang-Su Kim; | We propose a novel deep-learning-based video interpolation algorithm based on bilateral motion estimation. |
603 | Hard negative examples are hard, but useful | Hong Xuan; Abby Stylianou; Xiaotong Liu; Robert Pless; | In this paper, we characterize the space of triplets and derive why hard negatives make triplet loss training fail. |
604 | ReActNet: Towards Precise Binary Neural Network with Generalized Activation Functions | Zechun Liu; Zhiqiang Shen; Marios Savvides; Kwang-Ting Cheng; | In this paper, we propose several ideas for enhancing a bi- nary network to close its accuracy gap from real-valued networks without incurring any additional computational cost. |
605 | Video Object Detection via Object-level Temporal Aggregation | Chun-Han Yao; Chen Fang; Xiaohui Shen; Yangyue Wan; Ming-Hsuan Yang; | In this work we propose to improve video object detection via temporal aggregation. |
606 | Object Detection with a Unified Label Space from Multiple Datasets | Xiangyun Zhao; Samuel Schulter; Gaurav Sharma; Yi-Hsuan Tsai; Manmohan Chandraker; Ying Wu; | Given multiple datasets with different label spaces, the goal of this work is to train a single object detector predicting over the union of all the label spaces. |
607 | Lift, Splat, Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to 3D | Jonah Philion; Sanja Fidler; | We propose a new end-to-end architecture that directly extracts a bird’s-eye-view representation of a scene given image data from an arbitrary number of cameras. |
608 | Comprehensive Image Captioning via Scene Graph Decomposition | Yiwu Zhong; Liwei Wang; Jianshu Chen; Dong Yu; Yin Li; | We address the challenging problem of image captioning by revisiting the representation of image scene graph. |
609 | Symbiotic Adversarial Learning for Attribute-based Person Search | Yu-Tong Cao; Jingya Wang; Dacheng Tao; | In this paper, we present a symbiotic adversarial learning framework, called SAL. |
610 | Amplifying Key Cues for Human-Object-Interaction Detection | Yang Liu; Qingchao Chen; Andrew Zisserman; | In this paper we introduce two methods to amplify key cues in the image, and also a method to combine these and other cues when considering the interaction between a human and an object. |
611 | Rethinking Few-shot Image Classification: A Good Embedding is All You Need? | Yonglong Tian; Yue Wang; Dilip Krishnan; Joshua B. Tenenbaum; Phillip Isola; | In this work, we show that a simple baseline: learning a supervised or self-supervised representation on the meta-training set, followed by training a linear classifier on top of this representation, outperforms state-of-the-art few-shot learning methods. |
612 | Adversarial Background-Aware Loss for Weakly-supervised Temporal Activity Localization | Kyle Min; Jason J. Corso; | Despite recent advances, existing methods for weakly-supervised temporal activity localization struggle to recognize when an activity is not occurring. To address this issue, we propose a novel method named A2CL-PT. |
613 | Action Localization through Continual Predictive Learning | Sathyanarayanan Aakur; Sudeep Sarkar; | In this paper, we present a new approach based on continual learning that uses feature-level predictions for self-supervision. |
614 | Generative View-Correlation Adaptation for Semi-Supervised Multi-View Learning | Yunyu Liu; Lichen Wang; Yue Bai; Can Qin; Zhengming Ding; Yun Fu; | To address the challenges, we propose a novel View-Correlation Adaptation ( extit{VCA}) framework in semi-supervised fashion. |
615 | READ: Reciprocal Attention Discriminator for Image-to-Video Re-Identification | Minho Shim; Hsuan-I Ho; Jinhyung Kim; Dongyoon Wee; | In this work, we focus on image-to-video re-ID which compares a single query image to videos in the gallery. |
616 | 3D Human Shape Reconstruction from a Polarization Image | Shihao Zou; Xinxin Zuo; Yiming Qian; Sen Wang; Chi Xu; Minglun Gong ; Li Cheng; | This paper tackles the problem of estimating 3D body shape of clothed humans from single polarized 2D images, i.e. polarization images. |
617 | The Devil is in the Details: Self-Supervised Attention for Vehicle Re-Identification | Pirazh Khorramshahi; Neehar Peri; Jun-cheng Chen; Rama Chellappa; | In this paper, we present Self-supervised Attention for Vehicle Re-identification (SAVER), a novel approach to effectively learn vehicle-specific discriminative features. |
618 | Improving One-stage Visual Grounding by Recursive Sub-query Construction | Zhengyuan Yang; Tianlang Chen; Liwei Wang; Jiebo Luo; | To address this query modeling deficiency, we propose a recursive sub-query construction framework, which reasons between image and query for multiple rounds and reduces the referring ambiguity step by step. |
619 | Multi-level Wavelet-based Generative Adversarial Network for Perceptual Quality Enhancement of Compressed Video | Jianyi Wang; Xin Deng; Mai Xu; Congyong Chen; Yuhang Song; | In this paper, we focus on enhancing the perceptualquality of compressed video. |
620 | Example-Guided Image Synthesis using Masked Spatial-Channel Attention and Self-Supervision | Haitian Zheng; Haofu Liao; Lele Chen; Wei Xiong; Tianlang Chen; Jiebo Luo; | In this paper, we tackle a more challenging and general task, where the exemplar is a scene image that is semantically different from the given label map. |
621 | Content-Consistent Matching for Domain Adaptive Semantic Segmentation | Guangrui Li; Guoliang Kang; Wu Liu; Yunchao Wei; Yi Yang; | This paper considers the adaptation of semantic segmentation from the synthetic source domain to the real target domain. |
622 | AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting | Wenhai Wang; Xuebo Liu; Xiaozhong Ji; Enze Xie; Ding Liang; ZhiBo Yang; Tong Lu; Chunhua Shen; Ping Luo; | Unlike previous works that merely employed visual features for text detection, this work proposes a novel text spotter, named Ambiguity Eliminating Text Spotter (AE TextSpotter), which learns both visual and linguistic features to significantly reduce ambiguity in text detection. |
623 | History Repeats Itself: Human Motion Prediction via Motion Attention | Wei Mao; Miaomiao Liu; Mathieu Salzmann; | Here, we introduce an attention-based feed-forward network that explicitly leverages this observation. |
624 | Unsupervised Video Object Segmentation with Joint Hotspot Tracking | Lu Zhang; Jianming Zhang; Zhe Lin; Radomír M?ch; Huchuan Lu; You He; | Specifically, we propose a Weighted Correlation Siamese Network (WCS-Net) which employs a Weighted Correlation Block (WCB) for encoding the pixel-wise correspondence between a template frame and the search frame. |
625 | SRNet: Improving Generalization in 3D Human Pose Estimation with a Split-and-Recombine Approach | Ailing Zeng; Xiao Sun; Fuyang Huang; Minhao Liu; Qiang Xu; Stephen Lin; | We propose to take advantage of this fact for better generalization to rare and unseen poses. |
626 | CAFE-GAN: Arbitrary Face Attribute Editing with Complementary Attention Feature | Jeong gi Kwak; David K. Han; Hanseok Ko; | To address this unintended altering problem, we propose a novel GAN model which is designed to edit only the parts of a face pertinent to the target attributes by the concept of Complementary Attention Feature (CAFE). |
627 | MimicDet: Bridging the Gap Between One-Stage and Two-Stage Object Detection | Xin Lu; Quanquan Li; Buyu Li; Junjie Yan; | In this paper, we propose MimicDet, a novel and efficient framework to train a one-stage detector by directly mimic the two-stage features, aiming to bridge the accuracy gap between one-stage and two-stage detectors. |
628 | Latent Topic-aware Multi-Label Classification | Jianghong Ma; Yang Liu; | This paper shows that the sample and feature exaction, which are two important procedures for removing noisy and redundant information encoded in training samples in both sample and feature perspectives, can be effectively and efficiently performed in the latent topic space by considering topic-based feature-label correlation. |
629 | Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning | Xiangxi Shi; Xu Yang; Jiuxiang Gu; Shafiq Joty; Jianfei Cai; | In this paper, we propose a novel visual encoder to explicitly distinguish viewpoint changes from semantic changes in the change captioning task. |
630 | Attract, Perturb, and Explore: Learning a Feature Alignment Network for Semi-supervised Domain Adaptation | Taekyung Kim; Changick Kim; | We propose an SSDA framework that aims to align features via alleviation of the intra-domain discrepancy. |
631 | Curriculum Manager for Source Selection in Multi-Source Domain Adaptation | Luyu Yang; Yogesh Balaji; Ser-Nam Lim; Abhinav Shrivastava; | In this paper, we proposed an adversarial agent that learns a dynamic curriculum for source samples, called Curriculum Manager for Source Selection (CMSS). |
632 | Powering One-shot Topological NAS with Stabilized Share-parameter Proxy | Ronghao Guo; Chen Lin; Chuming Li; Keyu Tian; Ming Sun; Lu Sheng; Junjie Yan; | In this work, we try to enhance the one-shot NAS by exploring high-performing network architectures in our large-scale Topology Augmented Search Space (i.e., over 3.4×10^10 different topological structures). |
633 | Classes Matter: A Fine-grained Adversarial Approach to Cross-domain Semantic Segmentation | Haoran Wang; Tong Shen; Wei Zhang; Ling-Yu Duan; Tao Mei; | To fully exploit the supervision in the source domain, we propose a fine-grained adversarial learning strategy for class-level feature alignment while preserving the internal structure of semantics across domains. |
634 | Boundary-preserving Mask R-CNN | Tianheng Cheng; Xinggang Wang; Lichao Huang; Wenyu Liu; | To remedy this, we propose a conceptually simple yet effective Boundary-guided Mask R-CNN (BMask R-CNN) to leverage object boundary information to improve mask localization accuracy. |
635 | Self-supervised Single-view 3D Reconstruction via Semantic Consistency | Xueting Li; Sifei Liu; Kihwan Kim; Shalini De Mello; Varun Jampani; Ming-Hsuan Yang; Jan Kautz; | The key insight of our work is that objects can be represented as a collection of deformable parts, and each part is semantically coherent across different instances of the same category (e.g., wings on birds and wheels on cars). |
636 | MetaDistiller: Network Self-Boosting via Meta-Learned Top-Down Distillation | Benlin Liu; Yongming Rao; Jiwen Lu; Jie Zhou; Cho-Jui Hsieh; | Specifically, we propose that better soft targets with higher compatibility can be generated by using a label generator to fuse the featuremaps from deeper stages in a top-down manner, and we can employ the meta-learning technique to optimize this label generator. |
637 | Learning Monocular Visual Odometry via Self-Supervised Long-Term Modeling | Yuliang Zou; Pan Ji; Quoc-Huy Tran; Jia-Bin Huang; Manmohan Chandraker; | In this paper, we present a self-supervised learning method for VO with special consideration for consistency over longer sequences. |
638 | The Devil is in Classification: A Simple Framework for Long-tail Instance Segmentation | Tao Wang; Yu Li; Bingyi Kang; Junnan Li; Junhao Liew; Sheng Tang; Steven Hoi; Jiashi Feng; | Based on such an observation, we first consider various techniques for improving long-tail classification performance which indeed enhance instance segmentation results. We then propose a simple calibration framework to more effectively alleviate classification head bias with a bi-level class balanced sampling approach. |
639 | What is Learned in Deep Uncalibrated Photometric Stereo? | Guanying Chen; Michael Waechter; Boxin Shi; Kwan-Yee K. Wong; Yasuyuki Matsushita; | In this paper, we analyze the features learned by this method and find that they strikingly resemble attached shadows, shadings, and specular highlights, which are known to provide useful clues in resolving the generalized bas-relief (GBR) ambiguity. |
640 | Prior-based Domain Adaptive Object Detection for Hazy and Rainy Conditions | Vishwanath A. Sindagi; Poojan Oza; Rajeev Yasarla; Vishal M. Patel; | To address this issue, we propose an unsupervised prior-based domain adversarial object detection framework for adapting the detectors to hazy and rainy conditions. |
641 | Adversarial Ranking Attack and Defense | Mo Zhou; Zhenxing Niu; Le Wang; Qilin Zhang; Gang Hua; | In this paper, we propose two attacks against deep ranking systems,i.e., Candidate Attack and Query Attack, that can raise or lower the rank of chosen candidates by adversarial perturbations. |
642 | ReDro: Efficiently Learning Large-sized SPD Visual Representation | Saimunur Rahman; Lei Wang; Changming Sun; Luping Zhou; | This work proposes a novel scheme called Relation Dropout (ReDro). It is inspired by the fact that eigen-decomposition of a block diagonal matrix can be efficiently obtained by decomposing each of its diagonal square matrices, which are of smaller sizes. |
643 | Graph-Based Social Relation Reasoning | Wanhua Li; Yueqi Duan; Jiwen Lu; Jianjiang Feng; Jie Zhou; | In this paper, we propose a simpler, faster, and more accurate method named graph relational reasoning network (GR$^2$N) for social relation recognition. |
644 | EPNet: Enhancing Point Features with Image Semantics for 3D Object Detection | Tengteng Huang; Zhe Liu; Xiwu Chen; Xiang Bai; | In this paper, we aim at addressing two critical issues in the 3D detection task, including the exploitation of multiple sensors (namely LiDAR point cloud and camera image), as well as the inconsistency between the localization and classification confidence. |
645 | Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency | Jiaxiang Shang; Tianwei Shen; Shiwei li; Lei Zhou; Mingmin Zhen; Tian Fang; Long Quan; | In contrast to previous works that only enforce 2D feature constraints, we propose a self-supervised training architecture by leveraging the multi-view geometry consistency, which provides reliable constraints on face pose and depth estimation. |
646 | Asynchronous Interaction Aggregation for Action Detection | Jiajun Tang; Jin Xia; Xinzhi Mu; Bo Pang; Cewu Lu; | We propose the Asynchronous Interaction Aggregation network (AIA) that leverages different interactions to boost action detection. |
647 | Shape and Viewpoint without Keypoints | Shubham Goel; Angjoo Kanazawa; Jitendra Malik; | We present a learning framework that learns to recover the 3D shape, pose and texture from a single image, trained on an image collection without any ground truth 3D shape, multi-view, camera viewpoints or keypoint supervision. |
648 | Learning Attentive and Hierarchical Representations for 3D Shape Recognition | Jiaxin Chen; Jie Qin; Yuming Shen; Li Liu; Fan Zhu; Ling Shao; | This paper proposes a novel method for 3D shape representation learning, namely Hyperbolic Embedded Attentive Representation (HEAR). |
649 | TF-NAS: Rethinking Three Search Freedoms of Latency-Constrained Differentiable Neural Architecture Search | Yibo Hu; Xiang Wu; Ran He; | In this paper, we rethink three freedoms of differentiable NAS, i.e. operation-level, depth-level and width-level, and propose a novel method, named Three-Freedom NAS (TF-NAS), to achieve both good classification accuracy and precise latency constraint. |
650 | Associative3D: Volumetric Reconstruction from Sparse Views | Shengyi Qian; Linyi Jin; David F. Fouhey; | We propose a new approach that estimates reconstructions, distributions over the camera/object and camera/camera transformations, as well as an inter-view object affinity matrix. |
651 | PlugNet: Degradation Aware Scene Text Recognition Supervised by a Pluggable Super-Resolution Unit | Yongqiang Mou; Lei Tan; Hui Yang; Jingying Chen; Leyuan Liu; Rui Yan; Yaohong Huang; | In this paper, we address the problem of recognizing degradation images that are suffering from high blur or low-resolution. |
652 | Memory Selection Network for Video Propagation | Ruizheng Wu; Huaijia Lin; Xiaojuan Qi; Jiaya Jia; | To tackle this challenge, we propose a memory selection network, which learns to select suitable guidance from all previous frames for effective and robust propagation. |
653 | Disentangled Non-local Neural Networks | Minghao Yin; Zhuliang Yao; Yue Cao; Xiu Li; Zheng Zhang; Stephen Lin; Han Hu; | Based on these findings, we present the disentangled non-local block, where the two terms are decoupled to facilitate learning for both terms. |
654 | URVOS: Unified Referring Video Object Segmentation Network with a Large-Scale Benchmark | Seonguk Seo; Joon-Young Lee; Bohyung Han; | We propose a unified referring video object segmentation network (URVOS). |
655 | Generalizing Person Re-Identification by Camera-Aware Invariance Learning and Cross-Domain Mixup | Chuanchen Luo; Chunfeng Song; Zhaoxiang Zhang; | As for the latter issue, we propose a novel cross-domain mixup scheme. |
656 | Semi-Supervised Crowd Counting via Self-Training on Surrogate Tasks | Yan Liu; Lingqiao Liu; Peng Wang; Pingping Zhang; Yinjie Lei; | Specifically, we proposed a novel semi-supervised crowd counting method which is built upon two innovative components: (1) a set of inter-related binary segmentation tasks are derived from the original density map regression task as the surrogate prediction target (2) the surrogate target predictors are learned from both labeled and unlabeled data by utilizing a proposed self-training scheme which fully exploits the underlying constraints of these binary segmentation tasks. |
657 | Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training | Hongkai Zhang; Hong Chang; Bingpeng Ma; Naiyan Wang; Xilin Chen; | In this work, we first point out the inconsistency problem between the fixed network settings and the dynamic training procedure, which greatly affects the performance. |
658 | Boosting Decision-based Black-box Adversarial Attacks with Random Sign Flip | Weilun Chen; Zhaoxiang Zhang; Xiaolin Hu; Baoyuan Wu; | In this paper, we show that just randomly flipping the signs of a small number of entries in adversarial perturbations can significantly boost the attack performance. |
659 | Knowledge Transfer via Dense Cross-Layer Mutual-Distillation | Anbang Yao; Dawei Sun; | In this paper, we propose Dense Cross-layer Mutual-distillation (DCM), an improved two-way KT method in which the teacher and student networks are trained collaboratively from scratch. |
660 | Matching Guided Distillation | Kaiyu Yue; Jiangfan Deng; Feng Zhou; | In this paper, we present Matching Guided Distillation(MGD) as an efficient and parameter-free manner to solve these problems. |
661 | Clustering Driven Deep Autoencoder for Video Anomaly Detection | Yunpeng Chang; Zhigang Tu; Wei Xie; Junsong Yuan; | Since the abnormal events are usually different from normal events in appearance and/or in motion behavior, we address this issue by designing a novel convolution autoencoder architecture to separately capture spatial and temporal informative representation. |
662 | Learning to Compose Hypercolumns for Visual Correspondence | Juhong Min; Jongmin Lee; Jean Ponce; Minsu Cho; | In this work, we introduce a novel approach to visual correspondence that dynamically composes effective features by leveraging relevant layers conditioned on the images to match. |
663 | Stochastic Bundle Adjustment for Efficient and Scalable 3D Reconstruction | Lei Zhou; Zixin Luo; Mingmin Zhen; Tianwei Shen; Shiwei Li; Zhuofei Huang; Tian Fang; Long Quan; | In this work, we propose a stochastic bundle adjustment algorithm which seeks to decompose the RCS approximately inside the LM iterations to improve the efficiency and scalability. |
664 | Object-based Illumination Estimation with Rendering-aware Neural Networks | Xin Wei; Guojun Chen; Yue Dong; Stephen Lin; Xin Tong; | We present a scheme for fast environment light estimation from the RGBD appearance of individual objects and their local image areas. |
665 | Progressive Point Cloud Deconvolution Generation Network | Le Hui; Rui Xu; Jin Xie; Jianjun Qian; Jian Yang; | In this paper, we propose an effective point cloud generation method, which can generate multi-resolution point clouds of the same shape from a latent vector. |
666 | SSCGAN: Facial Attribute Editing via Style Skip Connections | Wenqing Chu; Ying Tai; Chengjie Wang; Jilin Li; Feiyue Huang; Rongrong Ji; | In this work, we focus on solving this issue by editing the channel-wise global information denoted as the style feature. |
667 | Negative Pseudo Labeling using Class Proportion for Semantic Segmentation in Pathology | Hiroki Tokunaga; Brian Kenji Iwana; Yuki Teramoto; Akihiko Yoshizawa ; Ryoma Bise; | In this paper, we propose a subtype segmentation method that uses such proportional labels as weakly supervised labels. |
668 | Learn to Propagate Reliably on Noisy Affinity Graphs | Lei Yang; Qingqiu Huang; Huaiyi Huang; Linning Xu; Dahua Lin; | To overcome these difficulties, we propose a new framework that allows labels to be propagated reliably on large-scale real-world data. |
669 | Fair DARTS: Eliminating Unfair Advantages in Differentiable Architecture Search | Xiangxiang Chu; Tianbao Zhou; Bo Zhang; Jixiang Li; | Thereby, we present a novel approach called Fair DARTS where the exclusive competition is relaxed to be collaborative. |
670 | TANet: Towards Fully Automatic Tooth Arrangement | Guodong Wei; Zhiming Cui; Yumeng Liu; Nenglun Chen; Runnan Chen; Guiqing Li; Wenping Wang; | In this work, we proposed a learning-based method for fast and automatic tooth arrangement. |
671 | UnionDet: Union-Level Detector Towards Real-Time Human-Object Interaction Detection | Bumsoo Kim; Taeho Choi; Jaewoo Kang; Hyunwoo J. Kim; | To tackle this problem, we propose UnionDet, a one-stage meta architecture for HOI detection powered by a novel union-level detector that eliminates this additional inference stage by directly capturing the region of interaction. |
672 | GSNet: Joint Vehicle Pose and Shape Reconstruction with Geometrical and Scene-aware Supervision | Lei Ke; Shichao Li; Yanan Sun; Yu-Wing Tai; Chi-Keung Tang; | We present a novel end-to-end framework named as GSNet ( extbf{\underline{G}}eometric and extbf{\underline{S}}cene-aware \underline{ extbf{Net}}work), which jointly estimates 6DoF poses and reconstructs detailed 3D car shapes from single urban street view. |
673 | Resolution Switchable Networks for Runtime Efficient Image Recognition | Yikai Wang; Fuchun Sun; Duo Li; Anbang Yao; | We propose a general method to train a single convolutional neural network which is capable of switching image resolutions at inference. |
674 | SMAP: Single-Shot Multi-Person Absolute 3D Pose Estimation | Jianan Zhen; Qi Fang; Jiaming Sun; Wentao Liu; Wei Jiang; Hujun Bao ; Xiaowei Zhou; | In this paper, we propose a novel system that first regresses a set of 2.5D representations of body parts and then reconstructs the 3D absolute poses based on these 2.5D representations with a depth-aware part association algorithm. |
675 | Learning to Detect Open Classes for Universal Domain Adaptation | Bo Fu; Zhangjie Cao; Mingsheng Long; Jianmin Wang; | Towards accurate open class detection, we propose Calibrated Multiple Uncertainties (CMU) with a novel transferability measure estimated by a mixture of uncertainty quantities in complementation: entropy, confidence and consistency, defined on conditional probabilities calibrated by a multi-classifier ensemble model. |
676 | Visual Compositional Learning for Human-Object Interaction Detection | Zhi Hou; Xiaojiang Peng; Yu Qiao; Dacheng Tao; | We devise a deep Visual Compositional Learning (VCL) framework, which is a simple yet efficient framework to effectively address this problem. |
677 | Deep Plastic Surgery: Robust and Controllable Image Editing with Human-Drawn Sketches | Shuai Yang; Zhangyang Wang; Jiaying Liu; Zongming Guo; | In this paper, we propose Deep Plastic Surgery, a novel, robust and controllable image editing framework that allows users to interactively edit images using hand-drawn sketch inputs. |
678 | Rethinking Class Activation Mapping for Weakly Supervised Object Localization | Wonho Bae; Junhyug Noh; Gunhee Kim; | We propose three simple but robust techniques that alleviate the problems, including thresholded average pooling, negative weight clamping, and percentile as a standard for thresholding. |
679 | OS2D: One-Stage One-Shot Object Detection by Matching Anchor Features | Anton Osokin; Denis Sumin; Vasily Lomakin; | In this paper, we consider the task of one-shot object detection, which consists in detecting objects defined by a single demonstration. |
680 | Interpretable Neural Network Decoupling | Yuchao Li; Rongrong Ji; Shaohui Lin; Baochang Zhang; Chenqian Yan; Yongjian Wu; Feiyue Huang; Ling Shao; | In this paper, we propose a novel architecture decoupling method to interpret the network from a perspective of investigating its calculation paths. |
681 | Omni-sourced Webly-supervised Learning for Video Recognition | Haodong Duan; Yue Zhao; Yuanjun Xiong; Wentao Liu; Dahua Lin; | We introduce OmniSource, a novel framework for leveraging web data to train video recognition models. |
682 | CurveLane-NAS: Unifying Lane-Sensitive Architecture Search and Adaptive Point Blending | Hang Xu; Shaoju Wang; Xinyue Cai; Wei Zhang; Xiaodan Liang; Zhenguo Li; | In this paper, we propose a novel lane-sensitive architecture search framework named CurveLane-NAS to automatically capture both long-ranged coherent and accurate short-range curve information while unifying both architecture search and post-processing on curve lane predictions via point blending. |
683 | Contextual-Relation Consistent Domain Adaptation for Semantic Segmentation | Jiaxing Huang; Shijian Lu; Dayan Guan; Xiaobing Zhang; | This paper presents an innovative local contextual-relation consistent domain adaptation (CrCDA) technique that aims to achieve local-level consistencies during the global-level alignment. |
684 | Estimating People Flows to Better Count Them in Crowded Scenes | Weizhe Liu; Mathieu Salzmann; Pascal Fua; | In this paper, we advocate estimating people flows across image locations between consecutive images and inferring the people densities from these flows instead of directly regressing. |
685 | Generate to Adapt: Resolution Adaption Network for Surveillance Face Recognition | Han Fang; Weihong Deng; Yaoyao Zhong; Jiani Hu; | To avoid this problem, we propose a novel resolution adaption network (RAN) which contains Multi-Resolution Generative Adversarial Networks (MR-GAN) followed by a feature adaption network. |
686 | Learning Feature Embeddings for Discriminant Model based Tracking | Linyu Zheng; Ming Tang; Yingying Chen; Jinqiao Wang; Hanqing Lu; | After observing that the features used in most online discriminatively trained trackers are not optimal, in this paper, we propose a novel and effective architecture to learn optimal feature embeddings for online discriminative tracking. |
687 | WeightNet: Revisiting the Design Space of Weight Networks | Ningning Ma; Xiangyu Zhang; Jiawei Huang; Jian Sun; | We present a conceptually simple, flexible and effective framework for weight generating networks. |
688 | Partially-Shared Variational Auto-encoders for Unsupervised Domain Adaptation with Target Shift | Ryuhei Takahashi; Atsushi Hashimoto; Motoharu Sonogashira; Masaaki Iiyama; | This paper discusses unsupervised domain adaptation (UDA) with target shift, i.e., UDA with the non-identical label distributions of the source and target domains. |
689 | Learning Where to Focus for Efficient Video Object Detection | Zhengkai Jiang; Yu Liu; Ceyuan Yang; Jihao Liu; Peng Gao; Qian Zhang; Shiming Xiang; Chunhong Pan; | Therefore, a novel module called Learnable Spatio-Temporal Sampling (LSTS) has been proposed to learn semantic-level correspondences among frame features accurately. |
690 | Learning Object Permanence from Video | Aviv Shamsian; Ofri Kleinfeld; Amir Globerson; Gal Chechik; | Here we introduce the setup of learning Object Permanence from labeled videos. |
691 | Adaptive Text Recognition through Visual Matching | Chuhan Zhang; Ankush Gupta; Andrew Zisserman; | We introduce a new model that exploits the repetitive nature of characters in languages, and decouples the visual decoding and linguistic modelling stages through intermediate representations in the form of similarity maps. |
692 | Actions as Moving Points | Yixuan Li; Zixu Wang; Limin Wang; Gangshan Wu; | In this paper, we present a conceptually simple, computationally efficient, and more precise action tubelet detection framework, termed as MovingCenter Detector (MOC-detector), by treating an action instance as a trajectory of moving points. |
693 | Learning to Exploit Multiple Vision Modalities by Using Grafted Networks | Yuhuang Hu; Tobi Delbruck; Shih-Chii Liu; | This paper proposes a Network Grafting Algorithm (NGA), where a new front end network driven by unconventional visual inputs replaces the front end network of a pretrained deep network that processes intensity frames. |
694 | Geometric Correspondence Fields: Learned Differentiable Rendering for 3D Pose Refinement in the Wild | Alexander Grabner; Yaming Wang; Peizhao Zhang; Peihong Guo; Tong Xiao; Peter Vajda; Peter M. Roth; Vincent Lepetit; | We present a novel 3D pose refinement approach based on differentiable rendering for objects of arbitrary categories in the wild. |
695 | 3D Fluid Flow Reconstruction Using Compact Light Field PIV | Zhong Li; Yu Ji; Jingyi Yu; Jinwei Ye; | In this paper, we present a PIV solution that uses a compact lenslet-based light field camera to track dense particles floating in the fluid and reconstruct the 3D fluid flow. |
696 | Contextual Diversity for Active Learning | Sharat Agarwal; Himanshu Arora; Saket Anand; Chetan Arora; | Since the context is difficult to evaluate in the absence of ground-truth labels, we introduce the notion of contextual diversity that captures the confusion associated with spatially co-occurring classes. |
697 | Temporal Aggregate Representations for Long-Range Video Understanding | Fadime Sener; Dipika Singhania; Angela Yao; | In this work, we address questions of temporal extent, scaling, and level of semantic abstraction with a flexible multi-granular temporal aggregation framework. |
698 | Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuous Sign Language Recognition | Zhe Niu; Brian Mak; | In this paper, we propose novel stochastic modeling of various components of a continuous sign language recognition (CSLR) system that is based on the transformer encoder and connectionist temporal classification (CTC). |
699 | General 3D Room Layout from a Single View by Render-and-Compare | Sinisa Stekovic; Shreyas Hampali; Mahdi Rad; Sayan Deb Sarkar; Friedrich Fraundorfer; Vincent Lepetit; | We present a novel method to reconstruct the 3D layout of a room—walls, ?oors, ceilings—from a single perspective view in challenging conditions, by contrast with previous single-view methods restricted to cuboid-shaped layouts. |
700 | Neural Dense Non-Rigid Structure from Motion with Latent Space Constraints | Vikramjit Sidhu; Edgar Tretschk; Vladislav Golyanik; Antonio Agudo; Christian Theobalt; | We introduce the first dense neural non-rigid structure from motion (N-NRSfM) approach, which can be trained end-to-end in an unsupervised manner from 2D point tracks. |
701 | Multimodal Memorability: Modeling Effects of Semantics and Decay on Video Memorability | Anelise Newman; Camilo Fosco; Vincent Casser; Allen Lee; Barry McNamara; Aude Oliva; | We introduce Memento10k, a new, dynamic video memorability dataset containing human annotations at different viewing delays. |
702 | Yet Another Intermediate-Level Attack | Qizhang Li; Yiwen Guo; Hao Chen; | In this paper, we propose a novel method to enhance the black-box transferability of baseline adversarial examples. |
703 | Topology-Change-Aware Volumetric Fusion for Dynamic Scene Reconstruction | Chao Li; Xiaohu Guo; | In this paper, the classic framework is re-designed to enable 4D reconstruction of dynamic scene under topology changes, by introducing a novel structure of Non-manifold Volumetric Grid to the re-design of both TSDF and EDG, which allows connectivity updates by cell splitting and replication. |
704 | Early Exit Or Not: Resource-Efficient Blind Quality Enhancement for Compressed Images | Qunliang Xing; Mai Xu; Tianyi Li; Zhenyu Guan; | In this paper, we propose a resource-efficient blind quality enhancement (RBQE) approach for compressed images. |
705 | PatchNets: Patch-Based Generalizable Deep Implicit 3D Shape Representations | Edgar Tretschk; Ayush Tewari; Vladislav Golyanik; Michael Zollhöfer; Carsten Stoll; Christian Theobalt; | In this paper, we present a mid-level patch-based surface representation. |
706 | How does Lipschitz Regularization Influence GAN Training? | Yipeng Qin; Niloy Mitra; Peter Wonka; | In this work, we uncover an even more important effect of Lipschitz regularization by examining its impact on the loss function: It degenerates GAN loss functions to almost linear ones by restricting their domain and interval of attainable gradient values. |
707 | Infrastructure-based Multi-Camera Calibration using Radial Projections | Yukai Lin; Viktor Larsson; Marcel Geppert; Zuzana Kukelova; Marc Pollefeys; Torsten Sattler; | In this paper, we propose to fully calibrate a multi-camera system from scratch using an infrastructure-based approach. |
708 | MotionSqueeze: Neural Motion Feature Learning for Video Understanding | Heeseung Kwon; Manjin Kim; Suha Kwak; Minsu Cho; | In this work, we replace external and heavy computation of optical flows with internal and light-weight learning of motion features. |
709 | Polarized Optical-Flow Gyroscope | Masada Tzabari; Yoav Y. Schechner; | We merge by generalization two principles of passive optical sensing of motion. |
710 | Online Meta-Learning for Multi-Source and Semi-Supervised Domain Adaptation | Da Li; Timothy Hospedales; | In this paper we take an orthogonal perspective and propose a framework to further enhance performance by meta-learning the initial conditions of existing DA algorithms. |
711 | An Ensemble of Epoch-wise Empirical Bayes for Few-shot Learning | Yaoyao Liu; Bernt Schiele; Qianru Sun; | In this paper, we propose to meta-learn the ensemble of epoch-wise empirical Bayes models (E3BM) to achieve robust predictions. |
712 | On the Effectiveness of Image Rotation for Open Set Domain Adaptation | Silvia Bucci; Mohammad Reza Loghmani; Tatiana Tommasi; | We propose a novel method to addresses both these problems using the self-supervised task of rotation recognition. |
713 | Combining Task Predictors via Enhancing Joint Predictability | Kwang In Kim; Christian Richardt; Hyung Jin Chang; | We present a new predictor combination algorithm that improves the target by i) measuring the relevance of references based on their capabilities in predicting the target, and ii) strengthening such estimated relevance. |
714 | Multi-Scale Positive Sample Refinement for Few-Shot Object Detection | Jiaxi Wu; Songtao Liu; Di Huang; Yunhong Wang; | To this end, we propose a Multi-scale Positive Sample Refinement (MPSR) approach to enrich object scales in FSOD. |
715 | Single-Image Depth Prediction Makes Feature Matching Easier | Carl Toft; Daniyar Turmukhambetov; Torsten Sattler; Fredrik Kahl; Gabriel J. Brostow; | In this paper, we propose a surprisingly effective enhancement to local feature extraction, which improves matching. |
716 | Deep Reinforced Attention Learning for Quality-Aware Visual Recognition | Duo Li; Qifeng Chen; | In this paper, we build upon the weakly-supervised generation mechanism of intermediate attention maps in any convolutional neural networks and disclose the effectiveness of attention modules more straightforwardly to fully exploit their potential. |
717 | CFAD: Coarse-to-Fine Action Detector for Spatiotemporal Action Localization | Yuxi Li; Weiyao Lin; John See; Ning Xu Shugong Xu; Ke Yan; Cong Yang; | In this paper, we propose Coarse-to-Fine Action Detector (CFAD), an original end-to-end trainable framework for efficient spatiotemporal action localization. |
718 | Learning Joint Spatial-Temporal Transformations for Video Inpainting | Yanhong Zeng; Jianlong Fu; Hongyang Chao; | In this paper, we propose to learn a joint Spatial-Temporal Transformer Network (STTN) for video inpainting. |
719 | Single Path One-Shot Neural Architecture Search with Uniform Sampling | Zichao Guo; Xiangyu Zhang; Haoyuan Mu; Wen Heng; Zechun Liu; Yichen Wei; Jian Sun; | This work propose a Single Path One-Shot model to address the challenge in the training. |
720 | Learning to Generate Novel Domains for Domain Generalization | Kaiyang Zhou; Yongxin Yang; Timothy Hospedales; Tao Xiang; | This paper focuses on domain generalization (DG), the task of learning from multiple source domains a model that generalizes well to unseen domains. |
721 | Continuous Adaptation for Interactive Object Segmentation by Learning from Corrections | Theodora Kontogianni; Michael Gygli; Jasper Uijlings; Vittorio Ferrari; | Instead, we recognize that user corrections can serve as sparse training examples and we propose a method that capitalizes on that idea to update the model parameters on-the-fly to the data at hand. |
722 | Impact of base dataset design on few-shot image classification | Othman Sbai; Camille Couprie; Mathieu Aubry; | In this paper, we systematically study the effect of variations in the training data by evaluating deep features trained on different image sets in a few-shot classification setting. |
723 | Invertible Zero-Shot Recognition Flows | Yuming Shen; Jie Qin; Lei Huang; Li Liu; Fan Zhu; Ling Shao; | To tackle the above limitations, for the first time, this work incorporates a new family of generative models (i.e., flow-based models) into ZSL. |
724 | GeoLayout: Geometry Driven Room Layout Estimation Based on Depth Maps of Planes | Weidong Zhang; Wei Zhang; Yinda Zhang; | In this work, we propose to incorporate geometric reasoning to deep learning for layout estimation. Moreover, we present a new dataset with pixel-level depth annotation of dominant planes. |
725 | Location Sensitive Image Retrieval and Tagging | Raul Gomez; Jaume Gibert; Lluis Gomez; Dimosthenis Karatzas; | In this work, we address the task of image retrieval related to a given tag conditioned on a certain location on Earth. |
726 | Joint 3D Layout and Depth Prediction from a Single Indoor Panorama Image | Wei Zeng; Sezer Karaoglu; Theo Gevers; | In this paper, we propose a method which jointly learns layout prediction and depth estimation from a single indoor panorama image. |
727 | Guessing State Tracking for Visual Dialogue | Wei Pang; Xiaojie Wang; | This paper proposes a guessing state for the Guesser, and regards guess as a process with change of guessing state through a dialogue. |
728 | Memory-Efficient Incremental Learning Through Feature Adaptation | Ahmet Iscen; Jeffrey Zhang; Svetlana Lazebnik; Cordelia Schmid; | We introduce an approach for incremental learning that preserves feature descriptors of training images from previously learned classes, instead of the images themselves, unlike most existing work. |
729 | Neural Voice Puppetry: Audio-driven Facial Reenactment | Justus Thies; Mohamed Elgharib; Ayush Tewari; Christian Theobalt; Matthias Nießner; | We present Neural Voice Puppetry, a novel approach for audio-driven facial video synthesis. |
730 | One-Shot Unsupervised Cross-Domain Detection | Antonio D’Innocente; Francesco Cappio Borlino; Silvia Bucci; Barbara Caputo; Tatiana Tommasi; | This paper addresses this setting, presenting an object detection algorithm able to perform unsupervised adaption across domains by using only one target sample, seen at test time. |
731 | Stochastic Frequency Masking to Improve Super-Resolution and Denoising Networks | Majed El Helou; Ruofan Zhou; Sabine Süsstrunk; | We present an analysis, in the frequency domain, of degradation-kernel overfitting in super-resolution and introduce a conditional learning perspective that extends to both super-resolution and denoising. |
732 | Probabilistic Future Prediction for Video Scene Understanding | Anthony Hu; Fergal Cotter; Nikhil Mohan; Corina Gurau; Alex Kendall; | We present a novel deep learning architecture for probabilistic future prediction from video. |
733 | Suppressing Mislabeled Data via Grouping and Self-Attention | Xiaojiang Peng; Kai Wang; Zhaoyang Zeng; Qing Li; Jianfei Yang; Yu Qiao; | To suppressing the impact of mislabeled data, this paper proposes a conceptually simple yet efficient training block, termed as Attentive Feature Mixup (AFM), which allows paying more attention to clean samples and less to mislabeled ones via sample interactions in small groups. |
734 | Class-wise Dynamic Graph Convolution for Semantic Segmentation | Hanzhe Hu; Deyi Ji; Weihao Gan; Shuai Bai; Wei Wu; Junjie Yan; | In order to avoid potential misleading contextual information aggregation in previous work, we propose a class-wise dynamic graph convolution(CDGC) module to adaptively propagate information. |
735 | Character-Preserving Coherent Story Visualization | Yun-Zhu Song; Zhi Rui Tam; Hung-Jen Chen; Huiao-Han Lu; Hong-Han Shuai; | Therefore, we propose a new framework named Character-Preserving Coherent Story Visualization (CP-CSV) to tackle the challenges. |
736 | GINet: Graph Interaction Network for Scene Parsing | Tianyi Wu; Yu Lu; Yu Zhu; Chuang Zhang; MingWu; Zhanyu Ma; Guodong Guo; | In this work, we explore how to incorperate the linguistic knowledge to promote context reasoning over image regions by proposing a Graph Interaction unit (GI unit) and a Semantic Context Loss (SC-loss). |
737 | Tensor Low-Rank Reconstruction for Semantic Segmentation | Wanli Chen; Xinge Zhu; Ruoqi Sun; Junjun He; Ruiyu Li; Xiaoyong Shen ; Bei Yu; | In this paper, we propose a new approach to model the 3D context representations,which not only avoids the space compression, but also tackles the high-rank difficulty. |
738 | Attentive Normalization | Xilai Li; Wei Sun; Tianfu Wu; | In this paper, we propose a light-weight integration between the two schema. |
739 | Count- and Similarity-aware R-CNN for Pedestrian Detection | Jin Xie; Hisham Cholakkal; Rao Muhammad Anwer; Fahad Shahbaz Khan; Yanwei Pang; Ling Shao; Mubarak Shah; | We propose an approach that leverages pedestrian count and proposal similarity information within a two-stage pedestrian detection framework. |
740 | TRADI: Tracking Deep Neural network Weight Distributions | Gianni Franchi; Andrei Bursuc; Emanuel Aldea; Séverine Dubuisson; Isabelle Bloch; | In this work we propose to make use of this knowledge and leverage it for computing the distributions of the weights of the DNN. |
741 | Spatiotemporal Attacks for Embodied Agents | Aishan Liu; Tairan Huang; Xianglong Liu; Yitao Xu; Yuqing Ma; Xinyun Chen; Stephen J. Maybank; Dacheng Tao; | In this work, we take the first step to study adversarial attacks for embodied agents. |
742 | Caption-Supervised Face Recognition: Training a State-of-the-Art Face Model without Manual Annotation | Qingqiu Huang; Lei Yang; Huaiyi Huang; Tong Wu; Dahua Lin; | In this work, we propose a simple yet effective method, which trains a face recognition model by progressively expanding the labeled set via both selective propagation and caption-driven expansion. |
743 | Unselfie: Translating Selfies to Neutral-pose Portraits in the Wild | Liqian Ma; Zhe Lin; Connelly Barnes; Alexei A Efros; Jingwan Lu; | To address this issue, we introduce unselfie, a novel photographic transformation that automatically translates a selfie into a neutral-pose portrait. |
744 | Design and Interpretation of Universal Adversarial Patches in Face Detection | Xiao Yang; Fangyun Wei; Hongyang Zhang; Jun Zhu; | We propose new optimization-based approaches to automatic design of universal adversarial patches for varying goals of the attack, including scenarios in which true positives are suppressed without introducing false positives. |
745 | Few-Shot Object Detection and Viewpoint Estimation for Objects in the Wild | Yang Xiao; Renaud Marlet; | We propose a meta-learning framework that can be applied to both tasks, possibly including 3D data. |
746 | Weakly Supervised 3D Hand Pose Estimation via Biomechanical Constraints | Adrian Spurr; Umar Iqbal; Pavlo Molchanov; Otmar Hilliges; Jan Kautz; | Embracing this challenge we propose a set of novel losses that constrain the prediction of a neural network to lie within the range of biomechanically feasible 3D hand configurations. |
747 | Dynamic Dual-Attentive Aggregation Learning for Visible-Infrared Person Re-Identification | Mang Ye; Jianbing Shen; David J. Crandall; Ling Shao; Jiebo Luo; | In this paper, we propose a novel dynamic dual-attentive aggregation (DDAG) learning method by mining both intra-modality part-level and cross-modality graph-level contextual cues for VI-ReID. |
748 | Contextual Heterogeneous Graph Network for Human-Object Interaction Detection | Hai Wang; Wei-shi Zheng; Ling Yingbiao; | In this work, we address such a problem for HOI task by proposing a heterogeneous graph network that models humans and objects as different kinds of nodes and incorporates intra-class messages between homogeneous nodes and inter-class messages between heterogeneous nodes. |
749 | Zero-Shot Image Super-Resolution with Depth Guided Internal Degradation Learning | Xi Cheng; Zhenyong Fu; Jian Yang; | In this work, we present a simple yet effective zero-shot image super-resolution model. |
750 | A Closest Point Proposal for MCMC-based Probabilistic Surface Registration | Dennis Madsen; Andreas Morel-Forster; Patrick Kahr; Dana Rahbani; Thomas Vetter; Marcel Lüthi; | We propose to view non-rigid surface registration as a probabilistic inference problem. |
751 | Interactive Video Object Segmentation Using Global and Local Transfer Modules | Yuk Heo; Yeong Jun Koh; Chang-Su Kim; | An interactive video object segmentation algorithm, which takes scribble annotations on query objects as input, is proposed in this paper. |
752 | End-to-end Interpretable Learning of Non-blind Image Deblurring | Thomas Eboli; Jian Sun; Jean Ponce; | We propose to precondition the Richardson solver using approximate inverse filters of the (known) blur and natural image prior kernels. |
753 | Employing Multi-Estimations for Weakly-Supervised Semantic Segmentation | Junsong Fan; Zhaoxiang Zhang; Tieniu Tan; | Instead of struggling to refine a single seed, we propose a novel approach to alleviate the inaccurate seed problem by leveraging the segmentation model’s robustness to learn from multiple seeds. |
754 | Learning Noise-Aware Encoder-Decoder from Noisy Labels by Alternating Back-Propagation for Saliency Detection | Jing Zhang; Jianwen Xie; Nick Barnes; | In this paper, we propose a noise-aware encoder-decoder framework to disentangle a clean saliency predictor from noisy training examples, where the noisy labels are generated by unsupervised handcrafted feature-based methods. |
755 | Rethinking Image Deraining via Rain Streaks and Vapors | Yinglong Wang; Yibing Song; Chao Ma; Bing Zeng; | In this work, we reformulate rain streaks as transmission medium together with vapors to model rain imaging. |
756 | Finding Non-Uniform Quantization Schemes using Multi-Task Gaussian Processes | Marcelo Gennari do Nascimento; Theo W. Costain; Victor Adrian Prisacariu; | We propose a novel method for neural network quantization that casts the neural architecture search problem as one of hyperparameter search to find non-uniform bit distributions throughout the layers of a CNN. |
757 | Is Sharing of Egocentric Video Giving Away Your Biometric Signature? | Daksh Thapar; Chetan Arora; Aditya Nigam; | In this work, we create a novel kind of privacy attack by extracting the wearer’s gait profile, a well known biometric signature, from such optical flow in the egocentric videos. |
758 | Captioning Images Taken by People Who Are Blind | Danna Gurari; Yinan Zhao; Meng Zhang; Nilavra Bhattacharya; | Observing that people who are blind have relied on (human-based) image captioning services to learn about images they take for nearly a decade, we introduce the first image captioning dataset to represent this real use case. |
759 | Improving Semantic Segmentation via Decoupled Body and Edge Supervision | Xiangtai Li; Xia Li; Li Zhang; Guangliang Cheng; Jianping Shi; Zhouchen Lin; Shaohua Tan; Yunhai Tong; | In this paper, a new paradigm for semantic segmentation is proposed. |
760 | Conditional Entropy Coding for Efficient Video Compression | Jerry Liu; Shenlong Wang; Wei-Chiu Ma; Meet Shah; Rui Hu; Pranaab Dhawan; Raquel Urtasun; | We propose a very simple and efficient video compression framework that only focuses on modeling the conditional entropy between frames. |
761 | Differentiable Feature Aggregation Search for Knowledge Distillation | Yushuo Guan; Pengyu Zhao; Bingxuan Wang; Yuanxing Zhang; Cong Yao; Kaigui Bian; Jian Tang; | Specifically, we introduce DFA, a two-stage Differentiable Feature Aggregation search method that motivated by DARTS in neural architecture search, to efficiently find the aggregations. |
762 | Attention Guided Anomaly Localization in Images | Shashanka Venkataramanan; Kuan-Chuan Peng; Rajat Vikram Singh; Abhijit Mahalanobis; | Without the need of anomalous training images, we propose Convolutional Adversarial Variational autoencoder with Guided Attention (CAVGA), which localizes the anomaly with a convolutional latent variable to preserve the spatial information. |
763 | Self-supervised Video Representation Learning by Pace Prediction | Jiangliu Wang; Jianbo Jiao; Yun-Hui Liu; | This paper addresses the problem of self-supervised video representation learning from a new perspective — by video pace prediction. |
764 | Full-Body Awareness from Partial Observations | Chris Rockwell; David F. Fouhey; | We study this problem and make a number of contributions to address it: (i) we propose a simple but highly effective self-training framework that adapts human 3D mesh recovery systems to consumer videos and demonstrate its application to two recent systems; |
765 | Reinforced Axial Refinement Network for Monocular 3D Object Detection | Lijie Liu; Chufan Wu; Jiwen Lu; Lingxi Xie; Jie Zhou; Qi Tian; | To improve the efficiency of sampling, we propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step. |
766 | Self-Supervised Multi-Task Procedure Learning from Instructional Videos | Ehsan Elhamifar; Dat Huynh; | We address the problem of unsupervised procedure learning from instructional videos of multiple tasks using Deep Neural Networks (DNNs). |
767 | CosyPose: Consistent multi-view multi-object 6D pose estimation | Yann Labbé Justin Carpentier; Mathieu Aubry; Josef Sivic; | We introduce an approach for recovering the 6D pose of multiple known objects in a scene captured by a set of input images with unknown camera viewpoints. |
768 | In-Domain GAN Inversion for Real Image Editing | Jiapeng Zhu; Yujun Shen; Deli Zhao; Bolei Zhou; | To solve this problem, we propose an in-domain GAN inversion approach, which not only faithfully reconstructs the input image but also ensures the inverted code to be semantically meaningful for editing. |
769 | Key Frame Proposal Network for Efficient Pose Estimation in Videos | Yuexi Zhang; Yin Wang; Octavia Camps; Mario Sznaier; | In this paper, we propose a novel method combining local approaches with global context. |
770 | Exchangeable Deep Neural Networks for Set-to-Set Matching and Learning | Yuki Saito; Takuma Nakamura; Hirotaka Hachiya; Kenji Fukumizu; | In this study, we propose a novel deep learning architecture to address the abovementioned difficulties and also an efficient training framework for set-to-set matching. |
771 | Making Sense of CNNs: Interpreting Deep Representations & Their Invariances with INNs | Robin Rombach; Patrick Esser; Björn Ommer; | We present an approach based on INNs that (i) recovers the task-specific, learned invariances by disentangling the remaining factor of variation in the data and that (ii) invertibly transforms these recovered invariances combined with the model representation into an equally expressive one with accessible semantic concepts. |
772 | Cross-Modal Weighting Network for RGB-D Salient Object Detection | Gongyang Li; Zhi Liu; Linwei Ye; Yang Wang; Haibin Ling; | In this paper, we propose a novel Cross-Modal Weighting (CMW) strategy to encourage comprehensive interactions between RGB and depth channels for RGB-D SOD. |
773 | Open-set Adversarial Defense | Rui Shao; Pramuditha Perera; Pong C. Yuen; Vishal M. Patel; | In this paper, we show that open-set recognition systems are vulnerable to adversarial attacks. |
774 | Deep Image Compression using Decoder Side Information | Sharon Ayzik; Shai Avidan; | We present a Deep Image Compression neural network that relies on side information, which is only available to the decoder. |
775 | Meta-Sim2: Unsupervised Learning of Scene Structure for Synthetic Data Generation | Jeevan Devaranjan; Amlan Kar; Sanja Fidler; | In this paper, we propose a generative model of synthetic scenes that reduces the distribution gap between the scene structure of generated scenes and a real target image dataset. |
776 | A Generic Visualization Approach for Convolutional Neural Networks | Ahmed Taha; Xitong Yang; Abhinav Shrivastava; Larry Davis; | We formulate attention visualization as a constrained optimization problem. |
777 | Interactive Annotation of 3D Object Geometry using 2D Scribbles | Tianchang Shen; Jun Gao; Amlan Kar; Sanja Fidler; | In this paper, we propose an interactive framework for annotating 3D object geometry from both point cloud data and RGB imagery. |
778 | Hierarchical Kinematic Human Mesh Recovery | Georgios Georgakis; Ren Li; Srikrishna Karanam; Terrence Chen; Jana Košecká Ziyan Wu; | In this work, we address this gap by proposing a new technique for regression of human parametric model that is explicitly informed by the known hierarchical structure, including joint interdependencies of the model. |
779 | Multi-Loss Rebalancing Algorithm for Monocular Depth Estimation | Jae-Han Lee; Chang-Su Kim; | An algorithm to combine multiple loss terms adaptively for training a monocular depth estimator is proposed in this work. |
780 | 3D Bird Reconstruction: a Dataset, Model, and Shape Recovery from a Single View | Marc Badger; Yufu Wang; Adarsh Modh; Ammon Perkes; Nikos Kolotouros ; Bernd G. Pfrommer; Marc F. Schmidt; Kostas Daniilidis; | To address this problem, we first introduce a model and multi-view optimization approach, which we use to capture the unique shape and pose space displayed by live birds. We then introduce a pipeline and experiments for keypoint, mask, pose, and shape regression that recovers accurate avian postures from single views. |
781 | We Have So Much In Common: Modeling Semantic Relational Set Abstractions in Videos | Alex Andonian; Camilo Fosco; Mathew Monfort; Allen Lee; Rogerio Feris; Carl Vondrick; Aude Oliva; | Here, we propose an approach for learning semantic relational set abstractions on videos, inspired by human learning. |
782 | Joint Optimization for Multi-Person Shape Models from Markerless 3D-Scans | Samuel Zeitvogel; Johannes Dornheim; Astrid Laubenheimer; | We propose a markerless end-to-end training framework for parametric 3D human shape models. |
783 | Accurate RGB-D Salient Object Detection via Collaborative Learning | Wei Ji; Jingjing Li; Miao Zhang; Yongri Piao; Huchuan Lu; | In this paper, we propose a novel collaborative learning framework where edge, depth and saliency are leveraged in a more efficient way, which solves those problems tactfully. |
784 | Finding Your (3D) Center: 3D Object Detection Using a Learned Loss | David Griffiths; Jan Boehm; Tobias Ritschel; | Addressing this disparity, we introduce a new optimization procedure, which allows training for 3D detection with raw 3D scans while using as little as 5\,\% of the object labels and still achieve comparable performance. |
785 | Collaborative Training between Region Proposal Localization and Classification for Domain Adaptive Object Detection | Ganlong Zhao; Guanbin Li; Ruijia Xu; Liang Lin; | In this paper, we are the first to reveal that the region proposal network (RPN) and region proposal classifier (RPC) in the endemic two-stage detectors (e.g., Faster RCNN) demonstrate significantly different transferability when facing large domain gap. |
786 | Two Stream Active Query Suggestion for Active Learning in Connectomics | Zudi Lin; Donglai Wei; Won-Dong Jang; Siyan Zhou; Xupeng Chen; Xueying Wang; Richard Schalek; Daniel Berger; Brian Matejek; Lee Kamentsky; Adi Peleg; Daniel Haehn; Thouis Jones; Toufiq Parag; Jeff Lichtman; Hanspeter Pfister; | To tackle this, we propose a two-stream active query suggestion approach. |
787 | Pix2Surf: Learning Parametric 3D Surface Models of Objects from Images | Jiahui Lei; Srinath Sridhar; Paul Guerrero; Minhyuk Sung; Niloy Mitra; Leonidas J. Guibas; | We investigate the problem of learning to generate 3D parametric surface representations for novel object instances, as seen from one or more views. |
788 | 6D Camera Relocalization in Ambiguous Scenes via Continuous Multimodal Inference | Mai Bui; Tolga Birdal; Haowen Deng; Shadi Albarqouni; Leonidas Guibas; Slobodan Ilic; Nassir Navab; | We present a multimodal camera relocalization framework that captures ambiguities and uncertainties with continuous mixture models defined |
789 | Modeling Artistic Workflows for Image Generation and Editing | Hung-Yu Tseng; Matthew Fisher; Jingwan Lu; Yijun Li; Vladimir Kim; Ming-Hsuan Yang; | Motivated by the above observations, we propose a generative model that follows a given artistic workflow, enabling both multi-stage image generation as well as multi-stage image editing of an existing piece of art. |
790 | A Large-scale Annotated Mechanical Components Benchmark for Classification and Retrieval Tasks with Deep Neural Networks | Sangpil Kim; Hyung-gun Chi; Xiao Hu; Qixing Huang; Karthik Ramani; | We introduce a large-scale annotated mechanical components benchmark for classification and retrieval tasks named MechanicalComponents Benchmark (MCB): a large-scale dataset of 3D objects of mechanical components. |
791 | Hidden Footprints: Learning Contextual Walkability from 3D Human Trails | Jin Sun; Hadar Averbuch-Elor; Qianqian Wang; Noah Snavely; | We tackle this problem by leveraging information from existing datasets, without any additional labeling. |
792 | Self-Supervised Learning of Audio-Visual Objects from Video | Triantafyllos Afouras; Andrew Owens; Joon Son Chung; Andrew Zisserman; | Our objective is to transform a video into a set of discrete audio-visual objects using self-supervised learning. |
793 | GAN-based Garment Generation Using Sewing Pattern Images | Yu Shen; Junbang Liang; Ming C. Lin; | We propose a unified method using the generative network. |
794 | Style Transfer for Co-Speech Gesture Animation: A Multi-Speaker Conditional-Mixture Approach | Chaitanya Ahuja; Dong Won Lee; Yukiko I. Nakano; Louis-Philippe Morency; | In this paper, we propose a new model, named Mix-StAGE, which trains a single model for multiple speakers while learning unique style embeddings for each speaker’s gestures in an end-to-end manner. |
795 | An LSTM Approach to Temporal 3D Object Detection in LiDAR Point Clouds | Rui Huang; Wanyue Zhang; Abhijit Kundu; Caroline Pantofaru; David A Ross; Thomas Funkhouser; Alireza Fathi; | To address this problem, in this paper we propose a sparse LSTM-based multi-frame 3d object detection algorithm. |
796 | Monotonicity Prior for Cloud Tomography | Tamar Loeub; Aviad Levis; Vadim Holodovsky; Yoav Y. Schechner; | We introduce a differentiable monotonicity prior, useful to express signals of monotonic tendency. |
797 | Learning Trailer Moments in Full-Length Movies with Co-Contrastive Attention | Lezi Wang; Dong Liu; Rohit Puri; Dimitris N. Metaxas; | We introduce a novel ranking network that utilizes the Co-Attention between movies and trailers as guidance to generate the training pairs, where the moments highly corrected with trailers are expected to be scored higher than the uncorrelated moments. |
798 | Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval | Christopher Thomas; Adriana Kovashka; | We propose novel within-modality losses which encourage semantic coherency in both the text and image subspaces, which does not necessarily align with visual coherency. |
799 | Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline | Vishvak Murahari; Dhruv Batra; Devi Parikh; Abhishek Das; | Instead, we present an approach to leverage pretraining on related vision-language datasets before transferring to visual dialog. |
800 | Learning to Generate Grounded Visual Captions without Localization Supervision | Chih-Yao Ma; Yannis Kalantidis; Ghassan AlRegib; Peter Vajda; Marcus Rohrbach; Zsolt Kira; | In this work, we help the model to achieve this via a novel cyclical training regimen that forces the model to localize each word in the image after the sentence decoder generates it, and then reconstruct the sentence from the localized image region(s) to match the ground-truth. |
801 | Neural Hair Rendering | Menglei Chai; Jian Ren; Sergey Tulyakov; | In this paper, we propose a generic neural-based hair rendering pipeline that can synthesize photo-realistic images from virtual 3D hair models. |
802 | JNR: Joint-based Neural Rig Representation for Compact 3D Face Modeling | Noranart Vesdapunt; Mitch Rundle; HsiangTao Wu; Baoyuan Wang; | In this paper, we introduce a novel approach to learn a 3D face model using a joint-based face rig and a neural skinning network. |
803 | On Disentangling Spoof Trace for Generic Face Anti-Spoofing | Yaojie Liu; Joel Stehouwer; Xiaoming Liu; | This work designs a novel adversarial learning framework to disentangle the spoof traces from input faces as a hierarchical combination of patterns at multiple scales. |
804 | Streaming Object Detection for 3-D Point Clouds | Wei Han; Zhengdong Zhang; Benjamin Caine; Brandon Yang; Christoph Sprunk; Ouais Alsharif; Jiquan Ngiam; Vijay Vasudevan; Jonathon Shlens; Zhifeng Chen; | In this work, we explore how to build an object detector that removes this artificial latency constraint, and instead operates on native streaming data in order to significantly reduce latency. |
805 | NAS-DIP: Learning Deep Image Prior with Neural Architecture Search | Yun-Chun Chen; Chen Gao; Esther Robb; Jia-Bin Huang; | Building upon a generic U-Net architecture, our core contribution lies in designing new search spaces for (1) an upsampling cell and (2) a pattern of cross-scale residual connections. |
806 | Learning to Learn in a Semi-Supervised Fashion | Yun-Chun Chen; Chao-Te Chou; Yu-Chiang Frank Wang; | To address semi-supervised learning from both labeled and unlabeled data, we present a novel meta-learning scheme. |
807 | FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning | Chia-Wen Kuo; Chih-Yao Ma; Jia-Bin Huang; Zsolt Kira; | In this paper, we propose a novel learned feature-based refinement and augmentation method that produces a varied set of complex transformations. |
808 | RadarNet: Exploiting Radar for Robust Perception of Dynamic Objects | Bin Yang; Runsheng Guo; Ming Liang; Sergio Casas; Raquel Urtasun; | To better address this, we propose a new solution that exploits both LiDAR and Radar sensors for perception. |
809 | Seeing the Un-Scene: Learning Amodal Semantic Maps for Room Navigation | Medhini Narasimhan; Erik Wijmans; Xinlei Chen; Trevor Darrell; Dhruv Batra; Devi Parikh; Amanpreet Singh; | We introduce a learning-based approach for room navigation using semantic maps. |
810 | Learning to Separate: Detecting Heavily-Occluded Objects in Urban Scenes | Chenhongyi Yang; Vitaly Ablavsky; Kaihong Wang; Qi Feng; Margrit Betke; | In this work, we propose a novel Non-Maximum-Suppression (NMS) algorithm that dramatically improves the detection recall while maintaining high precision in scenes with heavy occlusions. |
811 | Towards causal benchmarking of bias in face analysis algorithms | Guha Balakrishnan; Yuanjun Xiong; Wei Xia; Pietro Perona; | To address this problem we develop an experimental method for measuring algorithmic bias of face analysis algorithms, which directly manipulates the attributes of interest, e.g., gender and skin tone, in order to reveal causal links between attribute variation and performance change. |
812 | Learning and Memorizing Representative Prototypes for 3D Point Cloud Semantic and Instance Segmentation | Tong He; Dong Gong; Zhi Tian; Chunhua Shen; | To tackle the above issue, we propose a memory-augmented network that learns and memorizes the representative prototypes that encode both geometry and semantic information. |
813 | Knowledge-Based Video Question Answering with Unsupervised Scene Descriptions | Noa Garcia; Yuta Nakashima; | Inspired by this behaviour, we design ROLL, a model for knowledge-based video story question answering that leverages three crucial aspects of movie understanding: dialog comprehension, scene reasoning, and storyline recalling. |
814 | Transformation Consistency Regularization – A Semi-Supervised Paradigm for Image-to-Image Translation | Aamir Mustafa; Rafal K. Mantiuk; | We propose Transformation Consistency Regularization, which delves into a more challenging setting of image-to-image translation, which remains unexplored by semi-supervised algorithms. |
815 | LIRA: Lifelong Image Restoration from Unknown Blended Distortions | Jianzhao Liu; Jianxin Lin; Xin Li; Wei Zhou; Sen Liu; Zhibo Chen; | When the input is degraded by a new distortion, inspired by adult neurogenesis in human memory system, we develop a neural growing strategy where the previously trained model can incorporate a new expert branch and continually accumulate new knowledge without interfering with learned knowledge. |
816 | HDNet: Human Depth Estimation for Multi-Person Camera-Space Localization | Jiahao Lin; Gim Hee Lee; | In this paper, we propose the Human Depth Estimation Network (HDNet), an end-to-end framework for absolute root joint localization in the camera coordinate space. |
817 | SOLO: Segmenting Objects by Locations | Xinlong Wang; Tao Kong; Chunhua Shen; Yuning Jiang; Lei Li; | We present a new, embarrassingly simple approach to instance segmentation in images. |
818 | Learning to See in the Dark with Events | Song Zhang; Yu Zhang; Zhe Jiang; Dongqing Zou; Jimmy Ren; Bin Zhou; | In this paper, we propose learning to see in the dark by translating the HDR events in low light to canonical sharp images as if captured in day light. |
819 | Trajectron++: Dynamically-Feasible Trajectory Forecasting With Heterogeneous Data | Tim Salzmann; Boris Ivanovic; Punarjay Chakravarty; Marco Pavone; | Towards this end, we present Trajectron++, a modular, graph-structured recurrent model that forecasts the trajectories of a general number of diverse agents while incorporating agent dynamics and heterogeneous data (e.g., semantic maps). |
820 | Context-Gated Convolution | Xudong Lin; Lin Ma; Wei Liu; Shih-Fu Chang; | Motivated by this, we propose one novel Context-Gated Convolution (CGC) to explicitly modify the weights of convolutional layers adaptively under the guidance of global context. |
821 | Polynomial Regression Network for Variable-Number Lane Detection | Bingke Wang; Zilei Wang; Yixin Zhang; | In this work, we propose to use polynomial curves to represent traffic lanes and then propose a novel polynomial regression network (PRNet) to directly predict them, where semantic segmentation is not involved. |
822 | Structural Deep Metric Learning for Room Layout Estimation | Wenzhao Zheng; Jiwen Lu; Jie Zhou; | In this paper, we propose a structural deep metric learning (SDML) method for room layout estimation, which aims to recover the 3D spatial layout of a cluttered indoor scene from a monocular RGB image. |
823 | Adaptive Task Sampling for Meta-Learning | Chenghao Liu; Zhihao Wang; Doyen Sahoo; Yuan Fang Kun Zhang; Steven C.H. Hoi; | In this paper, we propose an adaptive task sampling method to improve the generalization performance. |
824 | Deep Complementary Joint Model for Complex Scene Registration and Few-shot Segmentation on Medical Images | Yuting He; Tiantian Li; Guanyu Yang; Youyong Kong; Yang Chen; Huazhong Shu; Jean-Louis Coatrieux; Jean-Louis Dillenseger; Shuo Li; | We propose a novel Deep Complementary Joint Model (DeepRS) for complex scene registration and few-shot segmentation. |
825 | Improving Multispectral Pedestrian Detection by Addressing Modality Imbalance Problems | Kailai Zhou; Linsen Chen; Xun Cao; | Inspired by this observation, we propose Modality Balance Network (MBNet) which facilitates the optimization process in a much more flexible and balanced manner. |
826 | High-Resolution Image Inpainting with Iterative Confidence Feedback and Guided Upsampling | Yu Zeng; Zhe Lin; Jimei Yang; Jianming Zhang; Eli Shechtman; Huchuan Lu; | To address this challenge, we propose an iterative inpainting method with a feedback mechanism. |
827 | Online Ensemble Model Compression using Knowledge Distillation | Devesh Walawalkar; Zhiqiang Shen; Marios Savvides; | This paper presents a novel knowledge distillation based model compression framework consisting of a student ensemble. |
828 | Deep Learning-based Pupil Center Detection for Fast and Accurate Eye Tracking System | Kang Il Lee; Jung Ho Jeon; Byung Cheol Song; | Thus, we propose more accurate pupil center detection by improving the representation quality of the network in charge of pupil center detection. |
829 | Efficient Residue Number System Based Winograd Convolution | Zhi-Gang Liu; Matthew Mattina; | Our work extends the Winograd algorithm to Residue Number System (RNS). |
830 | Robust Tracking against Adversarial Attacks | Shuai Jia; Chao Ma; Yibing Song; Xiaokang Yang; | We apply the proposed adversarial attack and defense approaches to state-of-the-art deep tracking algorithms. |
831 | Single-Shot Neural Relighting and SVBRDF Estimation | Shen Sang; Manmohan Chandraker; | We present a novel physically-motivated deep network for joint shape and material estimation, as well as relighting under novel illumination conditions, using a single image captured by a mobile phone camera. |
832 | Unsupervised 3D Human Pose Representation with Viewpoint and Pose Disentanglement | Qiang Nie ; Ziwei Liu ; Yunhui Liu; | In this work, we propose a novel Siamese denoising autoencoder to learn a 3D pose representation by disentangling the pose-dependent and view-dependent feature from the human skeleton data, in a fully unsupervised manner. |
833 | Angle-based Search Space Shrinking for Neural Architecture Search | Yiming Hu; Yuding Liang; Zichao Guo; Ruosi Wan; Xiangyu Zhang; Yichen Wei; Qingyi Gu; Jian Sun; | In this work, we present a simple and general search space shrinking method, called Angle-Based search space Shrinking (ABS), for Neural Architecture Search (NAS). |
834 | RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition | Xiaoyu Yue; Zhanghui Kuang; Chenhao Lin; Hongbin Sun; Wayne Zhang; | To suppress such side-effect, we propose a novel position enhancement branch, and dynamically fuse its outputs with those of the decoder attention module for scene text recognition. |
835 | Towards Fast, Accurate and Stable 3D Dense Face Alignment | Jianzhu Guo; Xiangyu Zhu; Yang Yang; Fan Yang; Zhen Lei; Stan Z. Li; | In this paper, we propose a novel regression framework which makes a balance among speed, accuracy and stability. |
836 | Iterative Feature Transformation for Fast and Versatile Universal Style Transfer | Tai-Yin Chiu; Danna Gurari; | We propose a new transformation that iteratively stylizes features with analytical gradient descent. |
837 | CATCH: Context-based Meta Reinforcement Learning for Transferrable Architecture Search | Xin Chen; Yawen Duan; Zewei Chen; Hang Xu; Zihao Chen; Xiaodan Liang; Tong Zhang; Zhenguo Li; | This is the first work to our knowledge that proposes an efficient transferrable NAS solution while maintaining robustness across various settings. |
838 | Toward Faster and Simpler Matrix Normalization via Rank-1 Update | Tan Yu; Yunfeng Cai; Ping Li; | To overcome these limitations, we propose a rank-1 update normalization (RUN), which only needs matrix-vector multiplications and thus is significantly more efficient than NS iteration using matrix-matrix multiplications. |
839 | Accurate Polarimetric BRDF for Real Polarization Scene Rendering | Yuhi Kondo; Taishi Ono; Legong Sun; Yasutaka Hirasawa; Jun Murayama; | In this paper, we propose a new polarimetric BRDF (pBRDF) model. |
840 | Lensless Imaging with Focusing Sparse URA Masks in Long-Wave Infrared and its Application for Human Detection | Ilya Reshetouski; Hideki Oyaizu; Kenichiro Nakamura; Ryuta Satoh; Suguru Ushiki; Ryuichi Tadano; Atsushi Ito; Jun Murayama; | We introduce a lensless imaging framework for contemporary computer vision applications in long-wavelength infrared (LWIR). |
841 | Topology-Preserving Class-Incremental Learning | Xiaoyu Tao; Xinyuan Chang; Xiaopeng Hong; Xing Wei; Yihong Gong; | On this basis, we propose a novel topology-preserving class-incremental learning (TPCIL) framework. |
842 | Inter-Image Communication for Weakly Supervised Localization | Xiaolin Zhang; Yunchao Wei; Yi Yang; | In this paper, we propose to leverage pixel-level similarities across different objects for learning more accurate object locations in a complementary way. |
843 | UFO²: A Unified Framework towards Omni-supervised Object Detection | Zhongzheng Ren; Zhiding Yu; Xiaodong Yang; Ming-Yu Liu; Alexander G. Schwing; Jan Kautz; | In this paper, we present UFO$^2$, a unified object detection framework that can handle different forms of supervision simultaneously. |
844 | iCaps: An Interpretable Classifier via Disentangled Capsule Networks | Dahuin Jung; Jonghyun Lee; Jihun Yi; Sungroh Yoon; | In this work, we address these two limitations using a novel class-supervised disentanglement algorithm and an additional regularizer, respectively. |
845 | Detecting Natural Disasters, Damage, and Incidents in the Wild | Ethan Weber; Nuria Marzo; Dim P. Papadopoulos; Aritro Biswas; Agata Lapedriza; Ferda Ofli; Muhammad Imran; Antonio Torralba; | In this work, we present the Incidents Dataset, which contains 446,684 images annotated by humans that cover 43 incidents across a variety of scenes. |
846 | Dynamic ReLU | Yinpeng Chen; Xiyang Dai; Mengchen Liu; Dongdong Chen; Lu Yuan; Zicheng Liu; | In this paper, we propose dynamic ReLU (DY-ReLU), a dynamic rectifier of which parameters are generated by a hyper function over all in-put elements. |
847 | Acquiring Dynamic Light Fields through Coded Aperture Camera | Kohei Sakai; Keita Takahashi; Toshiaki Fujii; Hajime Nagahara; | We investigate the problem of compressive acquisition of a dynamic light field. |
848 | Gait Recognition from a Single Image using a Phase-Aware Gait Cycle Reconstruction Network | Chi Xu; Yasushi Makihara; Xiang Li; Yasushi Yagi; Jianfeng Lu; | We propose a method of gait recognition just from a single image for the first time, which enables latency-free gait recognition. |
849 | Informative Sample Mining Network for Multi-Domain Image-to-Image Translation | Jie Cao; Huaibo Huang; Yi Li; Ran He; Zhenan Sun; | In this paper, we reveal that improving the sample selection strategy is an effective solution. |
850 | Spherical Feature Transform for Deep Metric Learning | Yuke Zhu; Yan Bai; Yichen Wei; | This work proposes a novel spherical feature transform approach. |
851 | Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering | Ruixue Tang; Chao Ma; Wei Emma Zhang; Qi Wu; Xiaokang Yang; | In this paper, instead of directly manipulating images and questions, we use generated adversarial examples for both images and questions as the augmented data. |
852 | Unsupervised Multi-View CNN for Salient View Selection of 3D Objects and Scenes | Ran Song; Wei Zhang; Yitian Zhao; Yonghuai Liu; | We present an unsupervised 3D deep learning framework based on a ubiquitously true proposition named by us view-object consistency as it states that a 3D object and its projected 2D views always belong to the same object class. |
853 | Representation Sharing for Fast Object Detector Search and Beyond | Yujie Zhong; Zelu Deng; Sheng Guo; Matthew R. Scott; Weilin Huang; | To enhance such capability, we propose an extremely efficient neural architecture search method, named Fast And Diverse (FAD), to better explore the optimal configuration of receptive fields and con-volution types in the sub-networks for one-stage detectors. |
854 | Peeking into occluded joints: A novel framework for crowd pose estimation | Lingteng Qiu; Xuanye Zhang; Yanran Li; Guanbin Li; Xiaojun Wu; Zixiang Xiong; Xiaoguang Han; Shuguang Cui; | Therefore, we thoroughly pursue this problem and propose a novel OPEC-Net framework together with a new Occluded Pose (OCPose) dataset with 9k annotated images. |
855 | RubiksNet: Learnable 3D-Shift for Efficient Video Action Recognition | Linxi Fan; Shyamal Buch; Guanzhi Wang; Ryan Cao; Yuke Zhu; Juan Carlos Niebles; Li Fei-Fei; | To this end, we introduce RubiksNet, a new efficient architecture for video action recognition which is based on a proposed learnable 3D spatiotemporal shift operation instead. |
856 | Deep Hashing with Active Pairwise Supervision | Ziwei Wang; Quan Zheng; Jiwen Lu; Jie Zhou; | n this paper, we propose a Deep Hashing method with Active Pairwise Supervision(DH-APS). |
857 | Graph Edit Distance Reward: Learning to Edit Scene Graph | Lichang Chen; Guosheng Lin; Shijie Wang; Qingyao Wu; | In this paper, we propose a new method to edit the scene graph according to the user instructions, which has never been explored. |
858 | Malleable 2.5D Convolution: Learning Receptive Fields along the Depth-axis for RGB-D Scene Parsing | Yajie Xing; Jingbo Wang; Gang Zeng; | In this paper, we propose a novel operator called malleable 2.5D convolution to learn the receptive field along the depth-axis. |
859 | Feature-metric Loss for Self-supervised Learning of Depth and Egomotion | Chang Shu; Kun Yu; Zhixiang Duan; Kuiyuan Yang; | In this work, feature-metric loss is proposed and defined on feature representation, where the feature representation is also learned in a self-supervised manner and regularized by both first-order and second-order derivatives to constrain the loss landscapes to form proper convergence basins. |
860 | Propagating Over Phrase Relations for One-Stage Visual Grounding | Sibei Yang; Guanbin Li; Yizhou Yu; | In this paper, we propose a linguistic structure guided propagation network for one-stage phrase grounding. |
861 | Adversarial Semantic Data Augmentation for Human Pose Estimation | Yanrui Bin; Xuan Cao; Xinya Chen; Yanhao Ge; Ying Tai; Chengjie Wang; Jilin Li; Feiyue Huang; Changxin Gao; Nong Sang; | We instead propose Semantic Data Augmentation (SDA), a method that augments images by pasting segmented body parts with various semantic granularity. |
862 | Free View Synthesis | Gernot Riegler; Vladlen Koltun; | We present a method for novel view synthesis from input images that are freely distributed around a scene. |
863 | Face Anti-Spoofing via Disentangled Representation Learning | Ke-Yue Zhang; Taiping Yao; Jian Zhang; Ying Tai; Shouhong Ding; Jilin Li; Feiyue Huang; Haichuan Song; Lizhuang Ma; | In this paper, motivated by the disentangled representation learning, we propose a novel perspective of face anti-spoofing that disentangles the liveness features and content features from images, and the liveness features is further used for classification. |
864 | Prime-Aware Adaptive Distillation | Youcai Zhang; Zhonghao Lan; Yuchen Dai; Fangao Zeng; Yan Bai; Jie Chang; Yichen Wei; | This paper introduces the adaptive sample weighting to KD. |
865 | Meta-Learning with Network Pruning | Hongduan Tian; Bo Liu; Xiao-Tong Yuan; Qingshan Liu; | To remedy this deficiency, we propose a network pruning based meta-learning approach for overfitting reduction via explicitly controlling the capacity of network. |
866 | Spiral Generative Network for Image Extrapolation | Dongsheng Guo; Hongzhi Liu; Haoru Zhao; Yunhao Cheng; Qingwei Song; Zhaorui Gu; Haiyong Zheng; Bing Zheng; | In this paper, motivated by human natural ability to perceive unseen surroundings imaginatively, we propose a novel Spiral Generative Network, SpiralNet, to perform image extrapolation in a spiral manner, which regards extrapolation as an evolution process growing from an input sub-image along a spiral curve to an expanded full image. |
867 | SceneSketcher: Fine-Grained Image Retrieval with Scene Sketches | Fang Liu; Changqing Zou; Xiaoming Deng; Ran Zuo; Yu-Kun Lai; Cuixia Ma; Yong-Jin Liu; Hongan Wang; | In this paper, for the first time, we study the fine-grained scene-level SBIR problem which aims at retrieving scene images satisfying the user’s specific requirements via a freehand scene sketch. |
868 | Few-shot Compositional Font Generation with Dual Memory | Junbum Cha; Sanghyuk Chun; Gayoung Lee; Bado Lee; Seonghyeon Kim; Hwalsuk Lee; | In this paper, we focus on compositional scripts, a widely used letter system in the world, where each glyph can be decomposed by several components. |
869 | PUGeo-Net: A Geometry-centric Network for 3D Point Cloud Upsampling | Yue Qian; Junhui Hou; Sam Kwong; Ying He; | In this paper, we propose a novel deep neural network based method, called PUGeo-Net, for upsampling 3D point clouds. |
870 | Handcrafted Outlier Detection Revisited | Luca Cavalli; Viktor Larsson; Martin Ralf Oswald; Torsten Sattler; Marc Pollefeys; | Based on best practices, we propose a hierarchical pipeline for effective outlier detection as well as integrate novel ideas which in sum lead to an efficient and competitive approach to outlier rejection. |
871 | The Average Mixing Kernel Signature | Luca Cosmo; Giorgia Minello; Michael Bronstein; Luca Rossi; Andrea Torsello; | We introduce the Average Mixing Kernel Signature (AMKS), a novel signature for points on non-rigid three-dimensional shapes based on the average mixing kernel and continuous-time quantum walks. |
872 | BCNet: Learning Body and Cloth Shape from A Single Image | Boyi Jiang; Juyong Zhang; Yang Hong; Jinhao Luo; Ligang Liu; Hujun Bao; | In this paper, we consider the problem to automatically reconstruct garment and body shapes from a single near-front view RGB image. To train our model, we construct two large scale datasets with ground truth body and garment geometries as well as paired color images. |
873 | Self-supervised Keypoint Correspondences for Multi-Person Pose Estimation and Tracking in Videos | Umer Rafi; Andreas Doering; Bastian Leibe; Juergen Gall; | To address this issue, we propose an approach that relies on key point correspondences for associating persons in videos. |
874 | Interactive Multi-Dimension Modulation with Dynamic Controllable Residual Learning for Image Restoration | Jingwen He; Chao Dong; Yu Qiao; | To make a step forward, this paper presents a new problem setup, called multi-dimension (MD) modulation, which aims at modulating output effects across multiple degradation types and levels. |
875 | Polysemy Deciphering Network for Human-Object Interaction Detection | Xubin Zhong; Changxing Ding; Xian Qu; Dacheng Tao; | To address this issue, in this paper, we propose a novel Polysemy Deciphering Network (PD-Net), which decodes the visual polysemy of verbs for HOI detection in three ways. |
876 | PODNet: Pooled Outputs Distillation for Small-Tasks Incremental Learning | Arthur Douillard; Matthieu Cord; Charles Ollion; Thomas Robert; Eduardo Valle; | In this work, we propose PODNet, a model inspired by representation learning. |
877 | Learning Graph-Convolutional Representations for Point Cloud Denoising | Francesca Pistilli; Giulia Fracastoro; Diego Valsesia; Enrico Magli; | We propose a deep neural network based on graph-convolutional layers that can elegantly deal with the permutation-invariance problem encountered by learning-based point cloud processing methods. |
878 | Semantic Line Detection Using Mirror Attention and Comparative Ranking and Matching | Dongkwon Jin; Jun-Tae Lee; Chang-Su Kim; | A novel algorithm to detect semantic lines is proposed in this paper. |
879 | A Differentiable Recurrent Surface for Asynchronous Event-Based Data | Marco Cannici; Marco Ciccone; Andrea Romanoni ; Matteo Matteucci; | In this paper, we propose Matrix-LSTM, a grid of Long Short-Term Memory (LSTM) cells that efficiently process events and learn end-to-end task-dependent event-surfaces. |
880 | Fine-Grained Visual Classification via Progressive Multi-Granularity Training of Jigsaw Patches | Ruoyi Du; Dongliang Chang; Ayan Kumar Bhunia; Jiyang Xie; Zhanyu Ma ; Yi-Zhe Song; Jun Guo; | In this work, we propose a novel framework for fine-grained visual classi?cation to tackle these problems. |
881 | LiteFlowNet3: Resolving Correspondence Ambiguity for More Accurate Optical Flow Estimation | Tak-Wai Hui; Chen Change Loy; | In this paper, we introduce LiteFlowNet3, a deep network consisting of two specialized modules, to address the above challenges. |
882 | Microscopy Image Restoration with Deep Wiener-Kolmogorov Filters | Valeriya Pronina; Filippos Kokkinos; Dmitry V. Dylov; Stamatios Lefkimmiatis; | In this work, we propose a unifying framework of algorithms for Gaussian image deblurring and denoising. |
883 | ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language | Dave Zhenyu Chen; Angel X. Chang; Matthias Nießner; | In order to train and benchmark our method, we introduce a new ScanRefer dataset, containing 46,173 descriptions of 9,943 objects from 703 ScanNet scenes. |
884 | JSENet: Joint Semantic Segmentation and Edge Detection Network for 3D Point Clouds | Zeyu Hu; Mingmin Zhen; Xuyang Bai; Hongbo Fu; Chiew-lan Tai; | In this paper, we tackle the 3D semantic edge detection task for the first time and present a new two-stream fully-convolutional network that jointly performs the two tasks. |
885 | Motion-Excited Sampler: Video Adversarial Attack with Sparked Prior | Hu Zhang; Linchao Zhu; Yi Zhu; Yi Yang; | In this paper, we aim to attack video models by utilizing intrinsic movement pattern and regional relative motion among video frames. |
886 | An Inference Algorithm for Multi-Label MRF-MAP Problems with Clique Size 100 | Ishant Shanu; Siddhant Bharti; Chetan Arora; S. N. Maheshwari; | In this paper, we propose an algorithm for optimal solutions to submodular higher-order multi-label MRF-MAP energy functions which can handle practical computer vision problems with up to 16 labels and cliques of size 100. |
887 | Dual Refinement Underwater Object Detection Network | Baojie Fan; Wei Chen; Yang Cong; Jiandong Tian; | To address these problems, we propose an underwater detection framework with feature enhancement and anchor refinement. |
888 | Multiple Sound Sources Localization from Coarse to Fine | Rui Qian; Di Hu; Heinrich Dinkel; Mengyue Wu; Ning Xu; Weiyao Lin; | To solve this problem, we develop a two-stage audiovisual learning framework that disentangles audio and visual representations of different categories from complex scenes, then performs cross-modal feature alignment in a coarse-to-fine manner. |
889 | Task-Aware Quantization Network for JPEG Image Compression | Jinyoung Choi; Bohyung Han; | We propose to learn a deep neural network for JPEG image compression, which predicts image-specific optimized quantization tables fully compatible with the standard JPEG encoder and decoder. |
890 | Energy-Based Models for Deep Probabilistic Regression | Fredrik K. Gustafsson; Martin Danelljan; Goutam Bhat; Thomas B. Schön; | We address these issues by proposing a general and conceptually simple regression method with a clear probabilistic interpretation. |
891 | CLOTH3D: Clothed 3D Humans | Hugo Bertiche; Meysam Madadi; Sergio Escalera; | We present CLOTH3D, the first big scale synthetic dataset of 3D clothed human sequences. |
892 | Encoding Structure-Texture Relation with P-Net for Anomaly Detection in Retinal Images | Kang Zhou; Yuting Xiao; Jianlong Yang; Jun Cheng; Wen Liu; Weixin Luo; Zaiwang Gu; Jiang Liu; Shenghua Gao; | Motivated by this, we propose to leverage the relation between the image texture and structure to design a deep neural network for anomaly detection. |
893 | CLNet: A Compact Latent Network for Fast Adjusting Siamese Trackers | Xingping Dong; Jianbing Shen; Ling Shao; Fatih Porikli; | In this paper, we provide a deep analysis for Siamese-based trackers and find that the one core reason for their failure on challenging cases can be attributed to the problem of {\it decisive samples missing} during offline training. |
894 | Occlusion-Aware Siamese Network for Human Pose Estimation | Lu Zhou; Yingying Chen; Yunze Gao; Jinqiao Wang; Hanqing Lu; | To conquer this dilemma, we propose an occlusion-aware siamese network to improve the performance. |
895 | Learning to Predict Salient Faces: A Novel Visual-Audio Saliency Model | Yufan Liu; Minglang Qiao; Mai Xu; Bing Li; Weiming Hu; Ali Borji; | In this paper, we thoroughly investigate such influences by establishing a large-scale eye-tracking database of Multiple-face Video in Visual-Audio condition (MVVA). |
896 | NormalGAN: Learning Detailed 3D Human from a Single RGB-D Image | Lizhen Wang; Xiaochen Zhao; Tao Yu; Songtao Wang; Yebin Liu; | We propose NormalGAN, a fast adversarial learning-based method to reconstruct the complete and detailed 3D human from a single RGB-D image. |
897 | Model-based occlusion disentanglement for image-to-image translation | Fabio Pizzati; Pietro Cerri; Raoul de Charette; | Our unsupervised model-based learning disentangles scene and occlusions, while benefiting from an adversarial pipeline to regress physical parameters of the occlusion model. |
898 | Rotation-robust Intersection over Union for 3D Object Detection | Yu Zheng; Danyang Zhang; Sinan Xie; Jiwen Lu; Jie Zhou; | In this paper, we propose a Rotation-robust Intersection over Union ($ extit{RIoU}$) for 3D object detection, which aims to jointly learn the overlap of rotated bounding boxes. |
899 | New Threats against Object Detector with Non-local Block | Yi Huang; Fan Wang; Adams Wai-Kin Kong; Kwok-Yan Lam; | In this paper, two new threats named disappearing attack and appearing attack against object detectors with a non-local block are investigated. |
900 | Self-Supervised CycleGAN for Object-Preserving Image-to-Image Domain Adaptation | Xinpeng Xie; Jiawei Chen; Yuexiang Li; Linlin Shen; Kai Ma; Yefeng Zheng; | In this paper, we propose a novel GAN (namely OP-GAN) to address the problem, which involves a self-supervised module to enforce the image content consistency during image-to-image translations without any extra annotations. |
901 | On the Usage of the Trifocal Tensor in Motion Segmentation | Federica Arrigoni; Luca Magri; Tomas Pajdla; | In this paper we address motion segmentation in multiple images by combining partial results coming from triplets of images, which are obtained by fitting a number of trifocal tensors to correspondences. |
902 | 3D-Rotation-Equivariant Quaternion Neural Networks | Wen Shen; Binbin Zhang; Shikun Huang; Zhihua Wei; Quanshi Zhang; | This paper proposes a set of rules to revise various neural networks for 3D point cloud processing to rotation-equivariant quaternion neural networks (REQNNs). |
903 | InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image | Gyeongsik Moon; Shoou-I Yu; He Wen; Takaaki Shiratori; Kyoung Mu Lee; | Therefore, we firstly propose (1) a large-scale dataset, InterHand2.6M, and (2) a baseline network, InterNet, for 3D interacting hand pose estimation from a single RGB image. |
904 | Active Crowd Counting with Limited Supervision | Zhen Zhao; Miaojing Shi; Xiaoxiao Zhao; Li Li; | In the last cycle when the labeling budget is met, the large amount of unlabeled data are also utilized: a distribution classifier is introduced to align the labeled data with unlabeled data furthermore, we propose to mix up the distribution labels and latent representations of data in the network to particularly improve the distribution alignment in-between training samples. |
905 | Self-Supervised Monocular Depth Estimation: Solving the Dynamic Object Problem by Semantic Guidance | Marvin Klingner; Jan-Aike Termhlen; Jonas Mikolajczyk; Tim Fingscheidt; | In this work we present a new self-supervised semantically-guided depth estimation (SGDepth) method to deal with moving dynamic-class (DC) objects, such as moving cars and pedestrians, which violate the static-world assumptions typically made during training of such models. |
906 | Hierarchical Visual-Textual Graph for Temporal Activity Localization via Language | Shaoxiang Chen; Yu-Gang Jiang; | In this paper, we propose a novel TALL method which builds a Hierarchical Visual-Textual Graph to model interactions between the objects and words as well as among the objects to jointly understand the video contents and the language. |
907 | Do Not Mask What You Do Not Need to Mask: a Parser-Free Virtual Try-On | Thibaut Issenhuth; Jérémie Mary; Clément Calauzènes; | In this paper, we propose a novel student-teacher paradigm where the teacher is trained in the standard way (reconstruction) before guiding the student to focus on the initial task (changing the cloth). |
908 | NODIS: Neural Ordinary Differential Scene Understanding | Yuren Cong; Hanno Ackermann; Wentong Liao; Michael Ying Yang; Bodo Rosenhahn; | In this work, we interpret that formulation as Ordinary Differential Equation (ODE). |
909 | AssembleNet++: Assembling Modality Representations via Attention Connections – Supplementary Material – | Michael S. Ryoo; AJ Piergiovanni; Juhana Kangaspunta; Anelia Angelova; | We create a family of powerful video models which are able to: (i) learn interactions between semantic object information and raw appearance and motion features, and (ii) deploy attention in order to better learn the importance of features at each convolutional block of the network. |
910 | Learning Propagation Rules for Attribution Map Generation | Yiding Yang; Jiayan Qiu; Mingli Song; Dacheng Tao; Xinchao Wang; | In this paper, we propose a dedicated method to generate attribution maps that allow us to learn the propagation rules automatically, overcoming the flaws of the hand-crafted ones. |
911 | Reparameterizing Convolutions for Incremental Multi-Task Learning without Task Interference | Menelaos Kanakis; David Bruggemann; Suman Saha; Stamatios Georgoulis ; Anton Obukhov; Luc Van Gool; | In this paper, we show that both can be achieved simply by reparameterizing the convolutions of standard neural network architectures into a non-trainable shared part (filter bank) and task-specific parts (modulators), where each modulator has a fraction of the filter bank parameters. |
912 | Learning Predictive Models from Observation and Interaction | Karl Schmeckpeper; Annie Xie; Oleh Rybkin; Stephen Tian; Kostas Daniilidis; Sergey Levine; Chelsea Finn; | We address the first challenge by formulating the corresponding graphical model and treating the action as an observed variable for the interaction data and an unobserved variablefor the observation data, and the second challenge by using a domain-dependent prior. |
913 | Unifying Deep Local and Global Features for Image Search | Bingyi Cao; André Araujo; Jack Sim; | In this work, our key contribution is to unify global and local features into a single deep model, enabling accurate retrieval with efficient feature extraction. |
914 | Human Body Model Fitting by Learned Gradient Descent | Jie Song; Xu Chen; Otmar Hilliges; | We propose a novel algorithm for the fitting of 3D human shape to images. |
915 | DDGCN: A Dynamic Directed Graph Convolutional Network for Action Recognition | Matthew Korban; Xin Li; | We propose a Dynamic Directed Graph Convolutional Network (DDGCN) to model spatial and temporal features of human actions from their skeletal representations. |
916 | Learning latent representations across multiple data domains using Lifelong VAEGAN | Fei Ye; Adrian G. Bors; | In this paper, we propose a novel lifelong learning approach, namely the Lifelong VAEGAN (L-VAEGAN), which not only induces a powerful generative replay network but also learns meaningful latent representations, benefiting representation learning. |
917 | DVI: Depth Guided Video Inpainting for Autonomous Driving | Miao Liao; Feixiang Lu; Dingfu Zhou; Sibo Zhang; Wei Li; Ruigang Yang; | To get clear street-view and photo-realistic simulation in autonomous driving, we present an automatic video inpainting algorithm that can remove traffic agents from videos and synthesize missing regions with the guidance of depth/point cloud. |
918 | Incorporating Reinforced Adversarial Learning in Autoregressive Image Generation | Kenan E. Ak; Ning Xu; Zhe Lin; Yilin Wang; | To address these limitations, we propose to use Reinforced Adversarial Learning (RAL) based on policy gradient optimization for autoregressive models. |
919 | APRICOT: A Dataset of Physical Adversarial Attacks on Object Detection | A. Braunegg; Amartya Chakraborty; Michael Krumdick; Nicole Lape; Sara Leary; Keith Manville; Elizabeth Merkhofer; Laura Strickhart; Matthew Walmer; | We present APRICOT, a collection of over 1,000 annotated photographs of printed adversarial patches in public locations. |
920 | Visual Question Answering on Image Sets | Ankan Bansal; Yuting Zhang; Rama Chellappa; | We introduce the task of Image-Set Visual Question Answering (ISVQA), which generalizes the commonly studied single-image VQA problem to multi-image settings. |
921 | Object as Hotspots: An Anchor-Free 3D Object Detection Approach via Firing of Hotspots | Qi Chen; Lin Sun; Zhixin Wang; Kui Jia; Alan Yuille; | We thus argue in this paper for an approach opposite to existing methods using object-level anchors. |
922 | Placepedia: Comprehensive Place Understanding with Multi-Faceted Annotations | Huaiyi Huang; Yuqi Zhang; Qingqiu Huang; Zhengkui Guo; Ziwei Liu; Dahua Lin; | In this work, we contribute Placepedia1, a large-scale place dataset with more than 35M photos from 240K unique places. |
923 | DELTAS: Depth Estimation by Learning Triangulation And densification of Sparse points | Ayan Sinha; Zak Murez; James Bartolozzi; Vijay Badrinarayanan; Andrew Rabinovich; | Distinct from cost volume approaches, we propose an efficient depth estimation approach by first (a) detecting and evaluating descriptors for interest points, then (b) learning to match and triangulate a small set of interest points, and finally densifying this sparse set of 3D points using CNNs. |
924 | Dynamic Low-light Imaging with Quanta Image Sensors | Yiheng Chi; Abhiram Gnanasambandam; Vladlen Koltun; Stanley H. Chan; | We propose a solution using Quanta Image Sensors (QIS) and present a new image reconstruction algorithm. |
925 | Disambiguating Monocular Depth Estimation with a Single Transient | Mark Nishimura; David B. Lindell; Christopher Metzler; Gordon Wetzstein; | In this work, we demonstrate how a depth histogram of the scene, which can be readily captured using a single-pixel time-resolved detector, can be fused with the output of existing monocular depth estimation algorithms to resolve the depth ambiguity problem. |
926 | DSDNet: Deep Structured self-Driving Network | Wenyuan Zeng; Shenlong Wang; Renjie Liao; Yun Chen; Bin Yang; Raquel Urtasun; | In this paper, we propose the Deep Structured self-Driving Network (DSDNet), which performs object detection, motion prediction, and motion planning with a single neural network. |
927 | QuEST: Quantized Embedding Space for Transferring Knowledge | Himalaya Jain; Spyros Gidaris; Nikos Komodakis; Patrick Pérez; Matthieu Cord; | In this work, we propose a novel way to achieve this goal: by distilling the knowledge through a quantized visual words space. |
928 | EGDCL: An Adaptive Curriculum Learning Framework for Unbiased Glaucoma Diagnosis | Rongchang Zhao; Xuanlin Chen; Zailiang Chen; Shuo Li; | In this paper, we propose a novel curriculum learning paradigm (EGDCL) to train an unbiased glaucoma diagnosis model with the adaptive dual-curriculum. |
929 | Backpropagated Gradient Representations for Anomaly Detection | Gukyeong Kwon; Mohit Prabhushankar; Dogancan Temel; Ghassan AlRegib; | Hence, we propose the utilization of backpropagated gradients as representations to characterize model behavior on anomalies and, consequently, detect such anomalies. |
930 | Dense RepPoints: Representing Visual Objects with Dense Point Sets | Ze Yang; Yinghao Xu; Han Xue; Zheng Zhang Raquel Urtasun; Liwei Wang ; Stephen Lin; Han Hu; | We present a new object representation, called Dense Rep-Points, which utilize a large number of points to describe the multi-grainedobject representation of both box level and pixel level. |
931 | On Dropping Clusters to Regularize Graph Convolutional Neural Networks | Xikun Zhang; Chang Xu; Dacheng Tao; | To effectively regularize GCNs, we devise DropCluster which first randomly zeros some seed entries and then zeros entries that are spatially or depth-wisely correlated to those seed entries. |
932 | Adaptive Video Highlight Detection by Learning from User History | Mrigank Rochan; Mahesh Kumar Krishna Reddy; Linwei Ye; Yang Wang; | In this paper, we propose a simple yet effective framework that learns to adapt highlight detection to a user by exploiting the user’s history in the form of highlights that the user has previously created. |
933 | Improving 3D Object Detection through Progressive Population Based Augmentation | Shuyang Cheng; Zhaoqi Leng; Ekin Dogus Cubuk; Barret Zoph; Chunyan Bai; Jiquan Ngiam; Yang Song; Benjamin Caine; Vijay Vasudevan; Congcong Li; Quoc V. Le; Jonathon Shlens; Dragomir Anguelov; | In this work, we present the first attempt to automate the design of data augmentation policies for 3D object detection. |
934 | DR-KFS: A Differentiable Visual Similarity Metric for 3D Shape Reconstruction | Jiongchao Jin; Akshay Gadi Patil; Zhang Xiong; Hao Zhang; | We introduce a differential visual similarity metric to train deep neural networks for 3D reconstruction, aimed at improving reconstruction quality. |
935 | SPAN: Spatial Pyramid Attention Network for Image Manipulation Localization | Xuefeng Hu; Zhihan Zhang; Zhenye Jiang; Syomantak Chaudhuri; Zhenheng Yang; Ram Nevatia; | We present a novel, Spatial Pyramid Attention Network (SPAN) for detection and localization of multiple types of image manipulations. |
936 | Adversarial Learning for Zero-shot Domain Adaptation | Jinghua Wang; Jianmin Jiang; | With the hypothesis that the shift between a given pair of domains is shared across tasks, we propose a new method for ZSDA by transferring domain shift from an irrelevant task (IrT) to the task of interest (ToI). |
937 | YOLO in the Dark – Domain Adaptation Method for Merging Multiple Models – | Yukihiro Sasagawa; Hajime Nagahara; | We propose a method of domain adaptation for merging multiple models with less effort than creating an additional dataset. |
938 | Identity-Aware Multi-Sentence Video Description | Jae Sung Park; Trevor Darrell; Anna Rohrbach; | We propose a multi-sentence Identity-Aware Video Description task, which overcomes this limitation and requires to re-identify persons locally within a set of consecutive clips. |
939 | VQA-LOL: Visual Question Answering under the Lens of Logic | Tejas Gokhale; Pratyay Banerjee; Chitta Baral; Yezhou Yang; | In this paper, we investigate whether visual question answering (VQA) systems trained to answer a question about an image, are able to answer the logical composition of multiple such questions. |
940 | Piggyback GAN: Efficient Lifelong Learning for Image Conditioned Generation | Mengyao Zhai; Lei Chen; Jiawei He; Megha Nawhal; Frederick Tung; Greg Mori; | In contrast, we propose a parameter efficient framework, Piggyback GAN, which learns the current task by building a set of convolutional and deconvolutional filters that are factorized into filters of the models trained on previous tasks. |
941 | TRRNet: Tiered Relation Reasoning for Compositional Visual Question Answering | Xiaofeng Yang; Guosheng Lin; Fengmao Lv; Fayao Liu; | We propose a novel tiered reasoning method that dynamically selects object level candidates based on language representations and generates robust pairwise relations within the selected candidate objects. |
942 | Mining Inter-Video Proposal Relations for Video Object Detection | Mingfei Han; Yali Wang; Xiaojun Chang; Yu Qiao; | To address the limitation, we propose a novel Inter-Video Proposal Relation module. |
943 | TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval | Jie Lei; Licheng Yu; Tamara L. Berg; Mohit Bansal; | We introduce TV show Retrieval (TVR), a new multimodal retrieval dataset. |
944 | Minimum Class Confusion for Versatile Domain Adaptation | Ying Jin; Ximei Wang; Mingsheng Long(); Jianmin Wang; | To this end, this paper studies Versatile Domain Adaptation (VDA),where one method can handle several different DA scenarios without any modification. |
945 | Large Batch Optimization for Object Detection: Training COCO in 12 Minutes | Tong Wang; Yousong Zhu; Chaoyang Zhao; Wei Zeng; Yaowei Wang; Jinqiao Wang; Ming Tang; | Specifically, we present a novel Periodical Moments Decay LAMB (PMD-LAMB) algorithm to effectively reduce the negative effects of the lagging historical gradients. |
946 | Towards Practical and Efficient High-Resolution HDR Deghosting with CNN | K. Ram Prabhakar; Susmit Agrawal; Durgesh Kumar Singh; Balraj Ashwath ; R. Venkatesh Babu; | In this paper, we present a deep neural network based approach to generate high-quality ghost-free HDR for high-resolution images. |
947 | Monocular Differentiable Rendering for Self-Supervised 3D Object Detection | Deniz Beker; Hiroharu Kato; Mihai Adrian Morariu; Takahiro Ando; Toru Matsuoka; Wadim Kehl; Adrien Gaidon; | To overcome this ambiguity, we present a novel self-supervised method for textured 3D shape reconstruction and pose estimation of rigid objects with the help of strong shape priors and 2D instance masks. |
948 | Shape Prior Deformation for Categorical 6D Object Pose and Size Estimation | Meng Tian; Marcelo H Ang Jr; Gim Hee Lee; | We present a novel learning approach to recover the 6D poses and sizes of unseen object instances from an RGB-D image. |
949 | Dynamic and Static Context-aware LSTM for Multi-agent Motion Prediction | Chaofan Tao; Qinhong Jiang; Lixin Duan; Ping Luo; | However, unlike previous work that isolated the spatial interaction, temporal coherence, and scene layout, this paper designs a new mechanism, \ie, Dynamic and Static Context-aware Motion Predictor (DSCMP), to integrates these rich information into the long-short-term-memory (LSTM). |
950 | Image-based table recognition: data, model, and evaluation | Xu Zhong; Elaheh ShafieiBavani; Antonio Jimeno Yepes; | To facilitate image-based table recognition with deep learning, we develop and release the largest publicly available table recognition dataset PubTabNet, containing 568k table images with corresponding structured HTML representation. |
951 | Group Activity Prediction with Sequential Relational Anticipation Model | Junwen Chen; Wentao Bao,; Yu Kong; | In this paper, we propose a novel approach to predict group activities given the beginning frames with incomplete activity executions. |
952 | PiP: Planning-informed Trajectory Prediction for Autonomous Driving | Haoran Song; Wenchao Ding; Yuxuan Chen; Shaojie Shen; Michael Yu Wang; Qifeng Chen; | We propose planning-informed trajectory prediction (PiP) to tackle the prediction problem in the multi-agent setting. |
953 | PSConv: Squeezing Feature Pyramid into One Compact Poly-Scale Convolutional Layer | Duo Li; Anbang Yao; Qifeng Chen; | We bridge this regret by exploiting multi-scale features in a finer granularity. |
954 | Hierarchical Context Embedding for Region-based Object Detection | Zhao-Min Chen; Xin Jin; Borui Zhao; Xiu-Shen Wei; Yanwen Guo; | To address this issue, we present a simple but effective Hierarchical Context Embedding (HCE) framework, which can be applied as a plug-and-play component, to facilitate the classification ability of a series of region-based detectors by mining contextual cues. |
955 | Attention-Driven Dynamic Graph Convolutional Network for Multi-Label Image Recognition | Jin Ye; Junjun He; Xiaojiang Peng; Wenhao Wu; Yu Qiao; | Our goal is to eliminate such bias and enhance the robustness of the learnt features. |
956 | Gen-LaneNet: A Generalized and Scalable Approach for 3D Lane Detection | Yuliang Guo; Guang Chen; Peitao Zhao; Weide Zhang; Jinghao Miao; Jingao Wang; Tae Eun Choe; | We present a generalized and scalable method, called Gen-LaneNet, to detect 3D lanes from a single image. Moreover, we release a new synthetic dataset and its construction strategy to encourage the development and evaluation of 3D lane detection methods. |
957 | Sparse-to-Dense Depth Completion Revisited: Sampling Strategy and Graph Construction | Xin Xiong; Haipeng Xiong; Ke Xian; Chen Zhao; Zhiguo Cao; Xin Li; | In this work, we approach this problem by addressing two issues that have been under-researched in the open literature: sampling strategy (data term) and graph construction (prior term). |
958 | MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation | Kaisiyuan Wang Qianyi Wu Linsen Song Zhuoqian Yang Wayne Wu Chen Qian Ran He Yu Qiao Chen Change Loy; | To address this issue, we build the Multi-view Emotional Audio-visual Dataset(MEAD) which is a talking-face video corpus featuring 60 actors and actresses talking with 8 different emotions at 3 different intensity levels. |
959 | Detecting Human-Object Interactions with Action Co-occurrence Priors | Dong-Jin Kim Xiao Sun Jinsoo Choi Stephen Lin In So Kweon; | In this paper, we model the correlations as action co-occurrence matrices and present techniques to learn these priors and leverage them for more effective training, especially in rare classes. |
960 | Learning Connectivity of Neural Networks from a Topological Perspective | Kun Yuan; Quanquan Li; Jing Shao; Junjie Yan; | In this paper, we attempt to optimize the connectivity in neural networks. |
961 | JSTASR: Joint Size and Transparency-Aware Snow Removal Algorithm Based on Modified Partial Convolution and Veiling Effect Removal | Wei-Ting Chen; Hao-Yu Fang; Jian-Jiun Ding; Cheng-Che Tsai; Sy-Yen Kuo; | In this paper, first, we reformulate the snow model. Different from that in the previous works, in the proposed snow model, the veiling effect is included. Second, a novel joint size and transparency-aware snow removal algorithm called JSTASR is proposed. |
962 | Ocean: Object-aware Anchor-free Tracking | Zhipeng Zhang; Houwen Peng; Jianlong Fu Bing Li; Weiming Hu; | In this paper, we propose a novel object-aware anchor-free network to address this issue. |
963 | Object Tracking using Spatio-Temporal Networks for Future Prediction Location | Yuan Liu; Ruoteng Li; Yu Cheng; Robby T. Tan; Xiubao Sui; | We introduce an object tracking algorithm that predicts the future locations of the target object and assists the tracker to handle object occlusion. |
964 | Pillar-based Object Detection for Autonomous Driving | Yue Wang; Alireza Fathi; Abhijit Kundu; David A. Ross; Caroline Pantofaru; Tom Funkhouser; Justin Solomon; | We present a simple and flexible object detection framework optimized for autonomous driving. |
965 | Sparse Adversarial Attack via Perturbation Factorization | Yanbo Fan; Baoyuan Wu; Tuanhui Li; Yong Zhang; Mingyang Li; Zhifeng Li; Yujiu Yang; | This work studies the sparse adversarial attack, which aims to generate adversarial perturbations onto partial positions of one benign image, such that the perturbed image is incorrectly predicted by one deep neural network (DNN) model. |
966 | 3D Scene Reconstruction from a Single Viewport | Maximilian Denninger; Rudolph Triebel; | We present a novel approach to infer volumetric reconstructions from a single viewport, based only on an RGB image and a reconstructed normal image. |
967 | Learning to Optimize Domain Specific Normalization for Domain Generalization | Seonguk Seo; Yumin Suh; Dongwan Kim; Geeho Kim; Jongwoo Han; Bohyung Han; | We propose a simple but effective multi-source domain generalization technique based on deep neural networks by incorporating optimized normalization layers that are specific to individual domains. |
968 | Self-supervised Outdoor Scene Relighting | Ye Yu; Abhimitra Meka; Mohamed Elgharib; Hans-Peter Seidel; Christian Theobalt; William A. P. Smith; | In contrast, we propose a self-supervised approach for relighting. |
969 | Privacy Preserving Visual SLAM | Mikiya Shibuya; Shinya Sumikura; Ken Sakurada; | This study proposes a privacy-preserving Visual SLAM framework for estimating camera poses and performing bundle adjustment with mixed line and point clouds in real time. |
970 | Leveraging Acoustic Images for Effective Self-Supervised Audio Representation Learning | Valentina Sanguineti; Pietro Morerio; Niccolò Pozzetti; Danilo Greco; Marco Cristani; Vittorio Murino; | In this paper, we propose the use of a new modality characterized by a richer information content, namely acoustic images, for the sake of audio-visual scene understanding. |
971 | Learning Joint Visual Semantic Matching Embeddings for Language-guided Retrieval | Yanbei Chen; Loris Bazzani; | In this work, we study the problem of composing images and textual modifications for language-guided retrieval in the context of fashion applications. |
972 | Globally Optimal and Efficient Vanishing Point Estimation in Atlanta World | Haoang Li; Pyojin Kim; Ji Zhao; Kyungdon Joo; Zhipeng Cai; Zhe Liu ; Yun-Hui Liu; | To overcome these limitations, we propose the novel mine-and-stab (MnS) algorithm and embed it in the branch-and-bound (BnB) algorithm. |
973 | StyleGAN2 Distillation for Feed-forward Image Manipulation | Yuri Viazovetskyi; Vladimir Ivashkin; Evgeny Kashin; | We propose a way to distill a particular image manipulation of StyleGAN2 into image-to-image network trained in paired way. |
974 | Self-Prediction for Joint Instance and Semantic Segmentation of Point Clouds | Jinxian Liu; Minghui Yu; Bingbing Ni?; Ye Chen; | We develop a novel learning scheme named Self-Prediction for 3D instance and semantic segmentation of point clouds. |
975 | Learning Disentangled Representations via Mutual Information Estimation | Eduardo Hugo Sanchez; Mathieu Serrurier; Mathias Ortner; | In this paper, we investigate the problem of learning disentangled representations. |
976 | Challenge-Aware RGBT Tracking | Chenglong Li; Lei Liu; Andong Lu; Qing Ji; Jin Tang; | In this paper, we propose a novel challenge-aware neural network to handle the modality-shared challenges (e.g., fast motion, scale variation and occlusion) and the modality-specific ones (e.g., illumination variation and thermal crossover) for RGBT tracking. |
977 | Fully Trainable and Interpretable Non-Local Sparse Models for Image Restoration | Bruno Lecouat; Jean Ponce; Julien Mairal; | We propose a novel differentiable relaxation of joint sparsity that exploits both principles and leads to a general framework for image restoration which is (1) trainable end to end, (2) fully interpretable, and (3) much more compact than competing deep learning architectures. |
978 | AutoSimulate: (Quickly) Learning Synthetic Data Generation | Harkirat Singh Behl; Atilim Güne? Baydin; Ran Gal; Philip H.S. Torr; Vibhav Vineet; | We propose an efficient alternative for optimal synthetic data generation, based on a novel differentiable approximation of the objective. |
979 | LatticeNet: Towards Lightweight Image Super-resolution with Lattice Block | Xiaotong Luo; Yuan Xie; Yulun Zhang; Yanyun Qu; Cuihua Li; Yun Fu; | To address this problem, we focus on the lightweight models for fast and accurate image SR. |
980 | Learning from Scale-Invariant Examples for Domain Adaptation in Semantic Segmentation | M.Naseer Subhani; Mohsen Ali; | In this paper, we propose a novel approach of exploiting scale-invariance property of the semantic segmentation model for self-supervised domain adaptation. |
981 | Active Visual Information Gathering for Vision-Language Navigation | Hanqing Wang; Wenguan Wang; Tianmin Shu; Wei Liang; Jianbing Shen; | To achieve this, we propose an end-to-end framework for learning an exploration policy that decides i) when and where to explore, ii) what information is worth gathering during exploration, and iii) how to adjust the navigation decision after the exploration. |
982 | Deep Hough-Transform Line Priors | Yancong Lin; Silvia L. Pintea; Jan C. van Gemert; | Here, we reduce the dependency on labeled data by building on the classic knowledge-based priors while using deep networks to learn features. |
983 | Unsupervised Shape and Pose Disentanglement for 3D Meshes | Keyang Zhou; Bharat Lal Bhatnagar; Gerard Pons-Moll; | In this paper, we presenta simple yet effective approach to learn disentangled shape and poserepresentations in an unsupervised setting. |
984 | CLAWS: Clustering Assisted Weakly Supervised Learning with Normalcy Suppression for Anomalous Event Detection | Muhammad Zaigham Zaheer; Arif Mahmood; Marcella Astrid; Seung-Ik Lee; | In this work, we propose a weakly supervised anomaly detection method which has manifold contributions including 1) a random batch based training procedure to reduce inter-batch correlation, 2) a normalcy suppression mechanism to minimize anomaly scores of the normal regions of a video by taking into account the overall information available in one training batch, and 3) a clustering distance based loss to contribute towards mitigating the label noise and to produce better anomaly representations by encouraging our model to generate distinct normal and anomalous clusters. |
985 | Inclusive GAN: Improving Data and Minority Coverage in Generative Models | Ning Yu; Ke Li; Peng Zhou Jitendra Malik; Larry Davis; Mario Fritz; | We develop an extension that allows explicit control over the minority subgroups that the model should ensure to include, and validate its effectiveness at little compromise from the overall performance on the entire dataset. |
986 | SESAME: Semantic Editing of Scenes by Adding, Manipulating or Erasing Objects | Evangelos Ntavelis; Andrés Romero; Iason Kastanis; Luc Van Gool; Radu Timofte; | To address these limitations, we propose SESAME, a novel generator-discriminator pair for Semantic Editing of Scenes by Adding, Manipulating or Erasing objects. |
987 | Dive Deeper Into Box for Object Detection | Ran Chen; Yong Liu; Mengdan Zhang; Shu Liu; Bei Yu; Yu-Wing Tai; | This motivates us to investigate a box reorganization method (DDBNet), which can dive deeper into the box to strive for more accurate localization. |
988 | PG-Net: Pixel to Global Matching Network for Visual Tracking | Bingyan Liao; Chenye Wang; Yayun Wang; Yaonong Wang; Jun Yin; | In this paper, a Pixel to Global Matching Network (PG-Net) is proposed to suppress the influence of background in search image while achieving state-of-the-art tracking performance. |
989 | Why Are Deep Representations Good Perceptual Quality Features? | Taimoor Tariq; Okan Tarhan Tursun; Munchurl Kim; Piotr Didyk; | We introduce two new formulations to measure the frequency and orientation selectivity of the features learned by convolutional layers for evaluating deep features learned by widely-used deep CNNs such as VGG-16. |
990 | Geometric Estimation via Robust Subspace Recovery | Aoxiang Fan; Xingyu Jiang; Yang Wang; Junjun Jiang; Jiayi Ma; | In this paper, we consider the problem from an optimization perspective, to exploit the intrinsic linear structure of point correspondences to assist estimation. |
991 | Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification | Sanath Narayan; Akshita Gupta; Fahad Shahbaz Khan; Cees G. M. Snoek; Ling Shao; | We propose to enforce semantic consistency at all stages of (generalized) zero-shot learning: training, feature synthesis and classification. |
992 | Human Correspondence Consensus for 3D Object Semantic Understanding | Yujing Lou; Yang You; Chengkun Li; Zhoujun Cheng; Liangwei Li; Lizhuang Ma; Weiming Wang; Cewu Lu; | In this paper, we introduce a new dataset named CorresPondenceNet. |
993 | Learning Memory Augmented Cascading Network for Compressed Sensing of Images | Jiwei Chen; Yubao Sun; Qingshan Liu; Rui Huang; | In this paper, we propose a cascading network for compressed sensing of images with progressive reconstruction. |
994 | Least squares surface reconstruction on arbitrary domains | Dizhong Zhu; William A. P. Smith; | We propose a new method for computing numerical derivatives based on 2D Savitzky-Golay filters and K-nearest neighbour kernels. |
995 | Task-conditioned Domain Adaptation for Pedestrian Detection in Thermal Imagery | My Kieu; Andrew D. Bagdanov; Marco Bertini; Alberto del Bimbo; | In this paper, we propose a novel approach to domain adaptation that significantly improves pedestrian detection performance in the thermal domain. |
996 | Improving the Transferability of Adversarial Examples with Resized-Diverse-Inputs, Diversity-Ensemble and Region Fitting | Junhua Zou; Zhisong Pan; Junyang Qiu; Xin Liu; Ting Rui; Wei Li; | We introduce a three stage pipeline: resized-diverse-inputs (RDIM), diversity-ensemble (DEM) and region fitting, that work together to generate transferable adversarial examples. |
997 | DADA: Differentiable Automatic Data Augmentation | Yonggang Li; Guosheng Hu; Yongtao Wang; Timothy Hospedales; Neil M. Robertson; Yongxin Yang; | In this paper, we propose Differentiable Automatic Data Augmentation (DADA) which dramatically reduces the cost. |
998 | SceneCAD: Predicting Object Alignments and Layouts in RGB-D Scans | Armen Avetisyan; Tatiana Khanova; Christopher Choy; Denver Dash; Angela Dai; Matthias Nießner; | We present a novel approach to reconstructing lightweight, CAD-based representations of scanned 3D environments from commodity RGB-D sensors. |
999 | Kinship Identification through Joint Learning using Kinship Verification Ensembles | Wei Wang; Shaodi You; Theo Gevers; | To this end, we propose a novel kinship identification approach based onjoint training of kinship verification ensembles and classification modules. |
1000 | Kernelized Memory Network for Video Object Segmentation | Hongje Seong; Junhyuk Hyun; Euntai Kim; | To solve the mismatch between STM and VOS, we propose a kernelized memory network (KMN). |
1001 | A Single Stream Network for Robust and Real-time RGB-D Salient Object Detection | Xiaoqi Zhao; Lihe Zhang¹ Youwei Pang; Huchuan Lu; Lei Zhang; | In this work, we design a single stream network to directly use the depth map to guide early fusion and middle fusion between RGB and depth, which saves the feature encoder of the depth stream and achieves a lightweight and real-time model. |
1002 | Splitting vs. Merging: Mining Object Regions with Discrepancy and Intersection Loss for Weakly Supervised Semantic Segmentation | Tianyi Zhang; Guosheng Lin; Weide Liu; Jianfei Cai; Alex Kot; | In this paper we focus on the task of weakly-supervised semantic segmentation supervised with image-level labels. |
1003 | Temporal Keypoint Matching and Refinement Network for Pose Estimation and Tracking | Chunluan Zhou Zhou Ren Gang Hua; | In this paper, we mainly focus on improving pose association and estimation in a video to build a strong pose estimator and tracker. |
1004 | Neural Point-Based Graphics | Kara-Ali Aliev; Artem Sevastopolsky; Maria Kolos; Dmitry Ulyanov; Victor Lempitsky; | We present a new point-based approach for modeling the appearance of real scenes. |
1005 | FHDe²Net: Full High Definition Demoireing Network | Bin He; Ce Wang; Boxin Shi; Ling-Yu Duan; | We propose the Full High Definition Demoir´eing Network (FHDe2Net) to solve such problems. |
1006 | Learning Structural Similarity of User Interface Layouts using Graph Networks | Dipu Manandhar; Dan Ruta; John Collomosse; | We propose a novel representation learning technique for measuring the similarity of user interface designs. |
1007 | NAS-Count: Counting-by-Density with Neural Architecture Search | Yutao Hu ¹ Xiaolong Jiang ² Xuhui Liu; Baochang Zhang; Jungong Han; Xianbin Cao ² David Doermann; | In this work, we automate the design of counting models with Neural Architecture Search (NAS) and introduce an end-to-end searched encoder-decoder architecture, Automatic Multi-Scale Network (AMSNet). |
1008 | Towards Generalization Across Depth for Monocular 3D Object Detection | Andrea Simonelli; Samuel Rota Buló Lorenzo Porzi; Elisa Ricci; Peter Kontschieder; | In particular, in this work we show that, thanks to our virtual views generation process, a lightweight, single-stage architecture suffices to set new state-of-the-art results on the popular KITTI3D benchmark. |
1009 | Margin-Mix: Semi–Supervised Learning for Face Expression Recognition | Corneliu Florea; Mihai Badea; Laura Florea; Andrei Racoviteanu; Constantin Vertan; | In this paper, as we aim to construct a semi-supervised learning algorithm, we exploit the characteristics of the Deep Convolutional Networks to provide, for an input image, both an embedding descriptor and a prediction. |
1010 | Principal Feature Visualisation in Convolutional Neural Networks | Marianne Bakken; Johannes Kvam; Alexey A. Stepanov; Asbjørn Berge; | We introduce a new visualisation technique for CNNs called Principal Feature Visualisation (PFV). |
1011 | Progressive Refinement Network for Occluded Pedestrian Detection | Xiaolin Song Kaili Zhao Wen-Sheng Chu Honggang Zhang Jun Guo; | We present Progressive Refinement Network (PRNet), a novel single-stage detector that tackles occluded pedestrian detection. |
1012 | Monocular Real-Time Volumetric Performance Capture | Ruilong Li; Yuliang Xiu; Shunsuke Saito; Zeng Huang; Kyle Olsewski; Hao Li; | We present the first approach to volumetric performance capture and novel-view rendering at real-time speed from monocular video, eliminating the need for expensive multi-view systems or cumbersome pre-acquisition of a personalized template model. |
1013 | The Mapillary Traffic Sign Dataset for Detection and Classification on a Global Scale | Christian Ertler; Jerneja Mislej; Tobias Ollmann; Lorenzo Porzi; Gerhard Neuhold; Yubin Kuang; | In this paper, we introduce a new traffic sign dataset of 105K street-level images around the world covering 400 manually annotated traffic sign classes in diverse scenes, wide range of geographical locations, and varying weather and lighting conditions. |
1014 | Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and Objects for 3D Hand Pose Estimation under Hand-Object Interaction | Anil Armagan; Guillermo Garcia-Hernando; Seungryul Baek; Shreyas Hampali; Mahdi Rad; Zhaohui Zhang; Shipeng Xie; MingXiu Chen; Boshen Zhang; Fu Xiong; Yang Xiao; Zhiguo Cao; Junsong Yuan; Pengfei Ren?; Weiting Huang?; Haifeng Sun?; Marek Hrúz?; Jakub Kanis?; Zden?k Kr?oul?; Qingfu Wan; Shile Li; Linlin Yang; Dongheui Lee; Angela Yao; Weiguo Zhou; Sijia Mei; Yunhui Liu; Adrian Spurr; Umar Iqbal; Pavlo Molchanov; Philippe Weinzaepfel; Romain Brégier; Grégory Rogez; Vincent Lepetit; Tae-Kyun Kim; | To address these issues, we designed a public challenge (HANDS’19) to evaluate the abilities of current 3D hand pose estimators~(HPEs) to interpolate and extrapolate the poses of a training set. |
1015 | Disentangling Multiple Features in Video Sequences using Gaussian Processes in Variational Autoencoders | Sarthak Bhagat; Shagun Uppal; Zhuyun Yin; Nengli Lim; | We introduce MGP-VAE (Multi-disentangled-features Gaussian Processes Variational AutoEncoder), a variational autoencoder which uses Gaussian processes (GP) to model the latent space for the unsupervised learning of disentangled representations in video sequences. |
1016 | SEN: A Novel Feature Normalization Dissimilarity Measure for Prototypical Few-Shot Learning Networks | Van Nhan Nguyen; Sigurd Løkse; Kristoffer Wickstrøm; Michael Kampffmeyer; Davide Roverso; Robert Jenssen; | In this paper, we equip Prototypical Networks (PNs) with a novel dissimilarity measure to enable discriminative feature normalization for few-shot learning. |
1017 | Kinematic 3D Object Detection in Monocular Video | Garrick Brazil; Gerard Pons-Moll; Xiaoming Liu; Bernt Schiele; | In this work, we propose a novel method for monocular video-based 3D object detection which carefully leverages kinematic motion to improve precision of 3D localization. |
1018 | Describing Unseen Videos via Multi-Modal Cooperative Dialog Agents | Ye Zhu; Yu Wu; Yi Yang; Yan Yan; | To this end, in this paper, we introduce a new task called video description via two multi-modal cooperative dialog agents, whose ultimate goal is for one conversational agent to describe an unseen video based on the dialog and two static frames. |
1019 | SACA Net: Cybersickness Assessment of Individual Viewers for VR Content via Graph-based Symptom Relation Embedding | Sangmin Lee; Jung Uk Kim; Hak Gu Kim; Seongyeop Kim; Yong Man Ro; | In this paper, we propose a novel symptom-aware cybersickness assessment network (SACA Net) that quantifies physical symptom levels for assessing cybersickness of individual viewers. |
1020 | End-to-End Low Cost Compressive Spectral Imaging with Spatial-Spectral Self-Attention | Ziyi Meng; Jiawei Ma; Xin Yuan; | To make solid progress on this challenging yet under-investigated task, we reproduce a stable single disperser (SD) CASSI system to gather large-scale real-world CASSI data and propose a novel deep convolutional network to carry out the real-time reconstruction by using self-attention. |
1021 | Know Your Surroundings: Exploiting Scene Information for Object Tracking | Goutam Bhat; Martin Danelljan; Luc Van Gool; Radu Timofte; | In this work, we propose a novel tracking architecture which can utilize scene information for tracking. |
1022 | Practical Detection of Trojan Neural Networks: Data-Limited and Data-Free Cases | Ren Wang; Gaoyuan Zhang; Sijia Liu; Pin-Yu Chen; Jinjun Xiong; Meng Wang; | In this paper, we study the problem of the Trojan network (TrojanNet) detection in the data-scarce regime, where only the weights of a trained DNN are accessed by the detector. |
1023 | Anatomy-Aware Siamese Network: Exploiting Semantic Asymmetry for Accurate Pelvic Fracture Detection in X-ray Images | Haomin Chen; Yirui Wang; Kang Zheng; Weijian Li; Chi-Tung Chang; Adam P. Harrison; Jing Xiao; Gregory D. Hager; Le Lu; Chien-Hung Liao; Shun Miao; | In this work, we present a new approach to fracture detection that uses a Siamese network to take advantage of the anatomical symmetry of pelvic structures to improve fracture detection. |
1024 | DeepLandscape: Adversarial Modeling of Landscape Videos | Elizaveta Logacheva; Roman Suvorov; Oleg Khomenko; Anton Mashikhin; Victor Lempitsky; | We propose simple but necessary modifications to StyleGAN inversion procedure, which lead to in-domain latent codes and allow to manipulate real images. |
1025 | GANwriting: Content-Conditioned Generation of Styled Handwritten Word Images | Lei Kang; Pau Riba; Yaxing Wang; Marçal Rusiñol; Alicia Fornés; Mauricio Villegas; | In this work, we take a step closer to producing realistic and varied artificially rendered handwritten words. |
1026 | Spatial-Angular Interaction for Light Field Image Super-Resolution | Yingqian Wang; Longguang Wang; Jungang Yang; Wei An; Jingyi Yu; Yulan Guo; | In this paper, we propose a spatial-angular interactive network (namely, LF-InterNet) for LF image SR. |
1027 | BATS: Binary ArchitecTure Search | Adrian Bulat; Brais Martinez; Georgios Tzimiropoulos; | This paper proposes Binary ArchitecTure Search (BATS), a framework that drastically reduces the accuracy gap between binary neural networks and their real-valued counterparts by means of Neural Architecture Search (NAS). |
1028 | A Closer Look at Local Aggregation Operators in Point Cloud Analysis | Ze Liu(†); Han Hu; Yue Cao; Zheng Zhang; Xin Tong; | In this paper, we revisit the representative local aggregation operators and study their performance using the same deep residual architecture. |
1029 | Look here! A parametric learning based approach to redirect visual attention | Youssef A. Mejjati; Celso F. Gomez; Kwang In Kim; Eli Shechtman; Zoya Bylinskii; | Motivated by professional work flows, we introduce an automatic method to make an image region more attention-capturing via subtle image edits that maintain realism and fidelity to the original. |
1030 | Variational Diffusion Autoencoders with Random Walk Sampling | Henry Li; Ofir Lindenbaum; Xiuyuan Cheng; Alexander Cloninger; | We propose a method that combines these approaches into a generative model that inherits the asymptotic guarantees of diffusion maps while preserving the scalability of deep models. |
1031 | Adaptive Variance Based Label Distribution Learning For Facial Age Estimation | Xin Wen; Biying Li; Haiyun Guo; Zhiwei Liu; Guosheng Hu; Ming Tang; Jinqiao Wang; | To model a sample-specific variance, in this paper, we propose an adaptive variance based distribution learning (AVDL) method for facial age estimation. |
1032 | Connecting the Dots: Detecting Adversarial Perturbations Using Context Inconsistency | Shasha Li; Shitong Zhu; Sudipta Paul; Amit Roy-Chowdhury; Chengyu Song; Srikanth Krishnamurthy; Ananthram Swami; Kevin S Chan; | In brief, our approach builds a set of autoencoders, one for each object class, appropriately trained so as to output a discrepancy between the input and output if a perturbation was added to the sample and trigger context violation. |
1033 | Perceive, Predict, and Plan: Safe Motion Planning Through Interpretable Semantic Representations | Abbas Sadat; Sergio Casas; Mengye Ren; Xinyu Wu; Pranaab Dhawan; Raquel Urtasun; | In this paper we propose a novel end-to-end learnable network that performs joint perception, prediction and motion planningfor self-driving vehicles and produces interpretable intermediate representations. |
1034 | VarSR: Variational Super-Resolution Network for Very Low Resolution Images | Sangeek Hyun; Jae-Pil Heo; | In this paper, we propose VarSR, Variational Super Resolution Network, that matches latent distributions of LR and HR images to recover the missing details. |
1035 | Co-Heterogeneous and Adaptive Segmentation from Multi-Source and Multi-Phase CT Imaging Data: A Study on Pathological Liver and Lesion Segmentation | Ashwin Raju; Chi-Tung Cheng; Yuankai Huo; Jinzheng Cai; Junzhou Huang; Jing Xiao; Le Lu; ChienHung Liao; Adam P. Harrison; | In this work, we present a novel segmentation strategy, co-heterogenous andadaptive segmentation (CHASe), which only requires a small labeled cohort of single phase data to adapt to any unlabeled cohort of heterogenous multi-phase data with possibly new clinical scenarios and pathologies. |
1036 | Towards Recognizing Unseen Categories in Unseen Domains | Massimiliano Mancini; Zeynep Akata; Elisa Ricci; Barbara Caputo; | The key idea of CuMix is to simulate the test-time domain and semantic shift using images and features from unseen domains and categories generated by mixing up the multiple source domains and categories available during training. |
1037 | Square Attack: a query-efficient black-box adversarial attack via random search | Maksym Andriushchenko; Francesco Croce; Nicolas Flammarion; Matthias Hein; | We propose the Square Attack, a score-based black-box $l_2$- and $l_\infty$- adversarial attack that does not rely on local gradient information and thus is not affected by gradient masking. |
1038 | You Are Here: Geolocation by Embedding Maps and Images | Noe Samano; Mengjie Zhou; Andrew Calway; | We present a novel approach to geolocalising panoramic images on a 2-D cartographic map based on learning a low dimensional embedded space, which allows a comparison between an image captured at a location and local neighbourhoods of the map. |
1039 | Segmentations-Leak: Membership Inference Attacks and Defenses in Semantic Image Segmentation | Yang He; Shadi Rahimian; Bernt Schiele; Mario Fritz; | We present the first attacks and defenses for complex, state of the art models for semantic segmentation. |
1040 | From Image to Stability: Learning Dynamics from Human Pose | Jesse Scott; Bharadwaj Ravichandran; Christopher Funk; Robert T. Collins; Yanxi Liu; | We propose and validate two end-to-end deep learning architectures to learn foot pressure distribution maps (dynamics) from 2D or 3D human pose (kinematics). |
1041 | LevelSet R-CNN: A Deep Variational Method for Instance Segmentation | Namdar Homayounfar Yuwen Xiong Justin Liang Wei-Chiu Ma Raquel Urtasun; | We propose LevelSet R-CNN, which combines the best of both worlds by obtaining powerful feature representations that are combined in an end-to-end manner with a variational segmentation framework. |
1042 | Efficient Scale-Permuted Backbone with Learned Resource Distribution | Xianzhi Du; Tsung-Yi Lin; Pengchong Jin; Yin Cui Mingxing Tan; Quoc Le; Xiaodan Song; | In this work, we propose a simple technique to combine efficient operations and compound scaling with a previously learned scale-permuted architecture. |
1043 | Reducing Distributional Uncertainty by Mutual Information Maximisation and Transferable Feature Learning | Jian Gao; Yang Hua; Guosheng Hu; Chi Wang; Neil M. Robertson; | In this paper, we propose to formulate the distributional uncertainty both between the source(s) and target domain(s) and within each domain using mutual information. |
1044 | Bridging Knowledge Graphs to Generate Scene Graphs | Alireza Zareian; Svebor Karaman; Shih-Fu Chang; | In this paper, we present a unified formulation of these two constructs, where a scene graph is seen as an image-conditioned instantiation of a commonsense knowledge graph. |
1045 | Implicit Latent Variable Model for Scene-Consistent Motion Forecasting | Sergio Casas; Cole Gulino; Simon Suo; Katie Luo; Renjie Liao; Raquel Urtasun; | In this paper, we aim to learn scene-consistent motion forecasts of complex urban traffic directly from sensor data. |
1046 | Learning Visual Commonsense for Robust Scene Graph Generation | Alireza Zareian; Zhecan Wang; Haoxuan You; Shih-Fu Chang; | We propose the first method to acquire visual commonsense such as affordance and intuitive physics automatically from data, and use that to improve the robustness of scene understanding. |
1047 | MPCC: Matching Priors and Conditionals for Clustering | Nicolás Astorga; Pablo Huijse; Pavlos Protopapas; Pablo Estévez; | We propose Matching Priors and Conditionals for Clustering (MPCC), a GAN-based model with an encoder to infer latent variables and cluster categories from data, and a flexible decoder to generate samples from a conditional latent space. |
1048 | PointAR: Efficient Lighting Estimation for Mobile Augmented Reality | Yiqin Zhao; Tian Guo; | We propose an efficient lighting estimation pipeline that is suitable to run on modern mobile devices, with comparable resource complexities to state-of-the-art mobile deep learning models. |
1049 | Discrete Point Flow Networks for Efficient Point Cloud Generation | Roman Klokov; Edmond Boyer; Jakob Verbeek; | We introduce a latent variable model that builds on normalizing flows with affine coupling layers to generate 3D point clouds of an arbitrary size given a latent shape representation. |
1050 | Accelerating Deep Learning with Millions of Classes | Zhuoning Yuan; Zhishuai Guo; Xiaotian Yu; Xiaoyu Wang; Tianbao Yang; | To address these issues, we propose an efficient training framework to handle extreme classification tasks based onRandom Projection. |
1051 | Password-conditioned Anonymization and Deanonymization with Face Identity Transformers | Xiuye Gu; Weixin Luo; Michael S. Ryoo; Yong Jae Lee; | We propose a novel face identity transformer which enables automated photo-realistic password-based anonymization and deanonymization of human faces appearing in visual data. |
1052 | Inertial Safety from Structured Light | Sizhuo Ma; Mohit Gupta; | We present inertial safety maps (ISM), a novel scene representation designed for fast detection of obstacles in scenarios involving camera or scene motion, such as robot navigation and human-robot interaction. |
1053 | PointTriNet: Learned Triangulation of 3D Point Sets | Nicholas Sharp; Maks Ovsjanikov; | We present PointTriNet, a differentiable and scalable approach enabling point set triangulation as a layer in 3D learning pipelines. |
1054 | Toward Unsupervised, Multi-Object Discovery in Large-Scale Image Collections | Huy V. Vo; Patrick Pérez; Jean Ponce; | We build on the optimization approach of Vo {m et al.} [34] with several key novelties: (1) We propose a novel saliency-based region proposal algorithm that achieves significantly higher overlap with ground-truth objects than other competitive methods. |
1055 | Deep Novel View Synthesis from Colored 3D Point Clouds | Zhenbo Song; Wayne Chen; Dylan Campbell; Hongdong Li; | We propose a new deep neural network which takes a colored 3D point cloud of a scene, and directly synthesizes a photo-realistic image from an arbitrary viewpoint. |
1056 | Consensus-Aware Visual-Semantic Embedding for Image-Text Matching | Haoran Wang; Ying Zhang; Zhong Ji; Yanwei Pang; Lin Ma; | In this paper, we propose a Consensus-aware Visual-Semantic Embedding (CVSE) model to incorporate the consensus information, namely the commonsense knowledge shared between both modalities, into image-text matching. |
1057 | Spatial Hierarchy Aware Residual Pyramid Network for Time-of-Flight Depth Denoising | Guanting Dong; Yueyi Zhang; Zhiwei Xiong; | In this paper, we propose a Spatial Hierarchy Aware Residual Pyramid Network, called SHARP-Net, to remove the depth noise by fully exploiting the geometry information of the scene on different scales. |
1058 | Sat2Graph: Road Graph Extraction through Graph-Tensor Encoding | Songtao He; Favyen Bastani; Satvat Jagwani; Mohammad Alizadeh; Hari Balakrishnan; Sanjay Chawla; Mohamed M. Elshrif; Samuel Madden; Mohammad Amin Sadeghi; | In this paper, we propose a new method, Sat2Graph, which combinesthe advantages of the two prior categories into a unified framework. |
1059 | Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition | Di Hu; Xuhong Li; Lichao Mou; Pu Jin; Dong Chen; Liping Jing; Xiaoxiang Zhu; Dejing Dou; | Inspired by the multi-channel perception theory in cognition science, in this paper, for improving the performance on the aerial scene recognition, we explore a novel audiovisual aerial scene recognition task using both images and sounds as input. |
1060 | Polarimetric Multi-View Inverse Rendering | Jinyu Zhao; Yusuke Monno; Masatoshi Okutomi; | In this paper, we propose a novel 3D reconstruction method called Polarimetric Multi-View Inverse Rendering (Polarimetric MVIR) that effectively exploits geometric, photometric, and polarimetric cues extracted from input multi-view color polarization images. |
1061 | SideInfNet: A Deep Neural Network for Semi-Automatic Semantic Segmentation with Side Information | Jing Yu Koh; Duc Thanh Nguyen; Quang-Trung Truong; Sai-Kit Yeung; Alexander Binder; | Inspired by the practicality and applicability of the semi-automatic approach, this paper proposes a novel deep neural network architecture, namely SideInfNet that effectively integrates features learnt from images with side information extracted from user annotations. |
1062 | Improving Face Recognition by Clustering Unlabeled Faces in the Wild | Aruni RoyChowdhury; Xiang Yu; Kihyuk Sohn; Erik Learned-Miller; Manmohan Chandraker; | To address this, we propose a novel identity separation method based on extreme value theory. |
1063 | NeuRoRA: Neural Robust Rotation Averaging | Pulak Purkait; Tat-Jun Chin; Ian Reid; | In this work, we aim to build a neural network that learns the noise patterns from the data and predict/regress the model parameters from the noisy relative orientations. |
1064 | SG-VAE: Scene Grammar Variational Autoencoder to generate new indoor scenes | Pulak Purkait; Christopher Zach; Ian Reid; | In this work, we propose a neural network to learn a generative model for sampling consistent indoor scene layouts. |
1065 | Unsupervised Learning of Optical Flow with Deep Feature Similarity | Woobin Im; Tae-Kyun Kim; Sung-Eui Yoon; | In this work, rather than the handcrafted features i.e. census or pixel values, we propose to use deep self-supervised features with a novel similarity measure, which fuses multi-layer similarities. |
1066 | Blended Grammar Network for Human Parsing | Xiaomei Zhang; Yingying Chen; Bingke Zhu; Jinqiao Wang; Ming Tang; | In this paper, we propose a Blended Grammar Network (BGNet), to deal with the challenge. |
1067 | P²Net: Patch-match and Plane-regularization for Unsupervised Indoor Depth Estimation | Zehao Yu; Lei Jin; Shenghua Gao; | In this paper, we argue that the poor performance suffers from the non-discriminative point-based matching. |
1068 | Efficient Attention Mechanism for Visual Dialog that can Handle All the Interactions between Multiple Inputs | Van-Quang Nguyen; Masanori Suganuma; Takayuki Okatani; | In this paper, we present a neural architecture named Light-weight Transformer for Many Inputs (LTMI) that can efficiently deal with all the interactions between multiple such inputs in visual dialog. |
1069 | Adaptive Mixture Regression Network with Local Counting Map for Crowd Counting | Xiyang Liu; Jie Yang; Wenrui Ding; Tieqiang Wang; Zhijin Wang; Junjun Xiong; | To solve this problem, we introduce a new target, named local counting map (LCM), to obtain more accurate results than density map based approaches. |
1070 | BIRNAT: Bidirectional Recurrent Neural Networks with Adversarial Training for Video Snapshot Compressive Imaging | Ziheng Cheng; Ruiying Lu; Zhengjue Wang; Hao Zhang; Bo Chen; Ziyi Meng; Xin Yuan; | We consider the problem of video snapshot compressive imaging (SCI), where multiple high-speed frames are coded by different masks and then summed to a single measurement. |
1071 | Ultra Fast Structure-aware Deep Lane Detection | Zequn Qin; Huanyu Wang; Xi Li; | Motivated by this observation, we propose a novel, simple, yet effective formulation aiming at extremely fast speed and challenging scenarios. |
1072 | Cross-Identity Motion Transfer for Arbitrary Objects through Pose-Attentive Video Reassembling | Subin Jeon; Seonghyeon Nam; Seoung Wug Oh; Seon Joo Kim; | We propose an attention-based networks for transferring motions between arbitrary objects. |
1073 | Domain Adaptive Object Detection via Asymmetric Tri-way Faster-RCNN | Zhenwei He; Lei Zhang; | Therefore, in order to avoid the source domain collapse risk caused by parameter sharing, we propose an asymmetric tri-way Faster-RCNN (ATF) for domain adaptive object detection. |
1074 | Exclusivity-Consistency Regularized Knowledge Distillation for Face Recognition | Xiaobo Wang; Tianyu Fu; Shengcai Liao; Shuo Wang; Zhen Lei; Tao Mei; | In this paper, we propose a novel position-aware exclusivity to encourage large diversity among different filters of the same layer to alleviate the low-capability of student network. |
1075 | Learning Camera-Aware Noise Models | Ke-Chi Chang; Ren Wang; Hung-Jin Lin; Yu-Lun Liu; Chia-Ping Chen; Yu-Lin Chang; Hwann-Tzong Chen; | To tackle this issue, we propose a data-driven approach, where a generative noise model is learned from real-world noise. |
1076 | Towards Precise Completion of Deformable Shapes | Oshri Halimi; Ido Imanuel; Or Litany; Giovanni Trappolini; Emanuele Rodolà Leonidas Guibas; Ron Kimmel; | More specifically, given the geometry of a full, articulated object in a given pose, as well as a partial scan of the same object in a different pose, we address the new problem of matching the part to the whole while simultaneously reconstructing the new pose from its partial observation. |
1077 | Iterative Distance-Aware Similarity Matrix Convolution with Mutual-Supervised Point Elimination for Efficient Point Cloud Registration | Jiahao Li; Changhao Zhang; Ziyao Xu; Hangning Zhou; Chi Zhang; | In this paper, we propose a novel learning-based pipeline for partially overlapping 3D point cloud registration. |
1078 | Pairwise Similarity Knowledge Transfer for Weakly Supervised Object Localization | Amir Rahimi; Amirreza Shaban; Thalaiyasingam Ajanthan; Richard Hartley; Byron Boots; | We study the problem of learning localization model on target classes with weakly supervised image labels, helped by a fully annotated source dataset. |
1079 | Environment-agnostic Multitask Learning for Natural Language Grounded Navigation | Xin Eric Wang; Vihan Jain; Eugene Ie; William Yang Wang; Zornitsa Kozareva; Sujith Ravi[2]; | To close the gap between seen and unseen environments, we aim at learning a generalized navigation model from two novel perspectives: (1) we introduce a multitask navigation model that can be seamlessly trained on both Vision-Language Navi-gation (VLN) and Navigation from Dialog History (NDH) tasks, which benefits from richer natural language guidance and effectively transfers knowledge across tasks; (2) we propose to learn environment-agnostic representations for the navigation policy that are invariant among the environments seen during training, thus generalizing better on unseen environments. |
1080 | TPFN: Applying Outer Product along Time to Multimodal Sentiment Analysis Fusion on Incomplete Data | Binghua Li; Chao Li; Feng Duan; Ning Zheng; Qibin Zhao; | To this end, we propose a novel network architecture termed Time Product Fusion Network (TPFN), which takes the high-order statistics over both modalities and temporal dynamics into account. |
1081 | ProxyNCA++: Revisiting and Revitalizing Proxy Neighborhood Component Analysis | Eu Wern Teh; Terrance DeVries; Graham W. Taylor; | We consider the problem of distance metric learning (DML), where the task is to learn an effective similarity measure between images. |
1082 | Learning with Privileged Information for Efficient Image Super-Resolution | Wonkyung Lee; Junghyup Lee; Dohyung Kim; Bumsub Ham; | We introduce in this paper a novel distillation framework, consisting of teacher and student networks, that allows to boost the performance of FSRCNN drastically. |
1083 | Joint Visual and Temporal Consistency for Unsupervised Domain Adaptive Person Re-Identification | Jianing Li,; Shiliang Zhang; | This paper tackles this challenge through jointly enforcing visual and temporal consistency in the combination of a local one-hot classification and a global multi-class classification. |
1084 | Autoencoder-based Graph Construction for Semi-supervised Learning | Mingeun Kang; Kiwon Lee; Yong H. Lee; Changho Suh; | In this paper, we propose a holistic approach that employs a parameterized neural-net-based autoencoder for matrix completion, thereby enabling simultaneous training between models of the classifier and matrix completion. |
1085 | Virtual Multi-view Fusion for 3D Semantic Segmentation | Abhijit Kundu; Xiaoqi Yin; Alireza Fathi; David Ross; Brian Brewington; Thomas Funkhouser; Caroline Pantofaru; | In this paper we revisit the classic multiview representation of 3D meshes and study several techniques that make them effective for 3D semantic segmentation of meshes. |
1086 | Decoupling GCN with DropGraph Module for Skeleton-Based Action Recognition | Ke Cheng; Yifan Zhang; Congqi Cao; Lei Shi; Jian Cheng; Hanqing Lu; | In this paper, we rethink the spatial aggregation in existing GCN-based skeleton action recognition methods and discover that they are limited by coupling aggregation mechanism. |
1087 | Deep Shape from Polarization | Yunhao Ba; Alex Gilbert; Franklin Wang; Jinfa Yang; Rui Chen; Yiqin Wang; Lei Yan; Boxin Shi; Achuta Kadambi; | This paper makes a first attempt to bring the Shape from Polarization (SfP) problem to the realm of deep learning. |
1088 | A Boundary Based Out-of-Distribution Classifier for Generalized Zero-Shot Learning | Xingyu Chen; Xuguang Lan; Fuchun Sun; Nanning Zheng; | To resolve this problem, in this paper, we propose a boundary based Out-of-Distribution (OOD) classifier which classifies the unseen and seen domains by only using seen samples for training. |
1089 | Mind the Discriminability: Asymmetric Adversarial Domain Adaptation | Jianfei Yang; Han Zou; Yuxun Zhou; Zhaoyang Zeng; Lihua Xie (); | In this paper, we tackle this problem by designing a simple yet effective scheme, namely Asymmetric Adversarial Domain Adaptation (AADA). |
1090 | SeqXY2SeqZ: Structure Learning for 3D Shapes by Sequentially Predicting 1D Occupancy Segments From 2D Coordinates | Zhizhong Han; Guanhui Qiao; Yu-Shen Liu; Matthias Zwicker; | To avoid dense and irregular sampling in 3D, we propose to represent shapes using 2D functions, where the output of the function at each 2D location is a sequence of line segments inside the shape. |
1091 | Simultaneous Detection and Tracking with Motion Modelling for Multiple Object Tracking | ShiJie Sun; Naveed Akhtar; XiangYu Song; HuanSheng Song; Ajmal Mian ; Mubarak Shah; | To resolve this issue, we introduce Deep Motion Modeling Network (DMM-Net) that can estimate multiple objects’ motion parameters to perform joint detection and association in an end-to-end manner. |
1092 | Deep FusionNet for Point Cloud Semantic Segmentation | Feihu Zhang Jin Fang Benjamin Wah Philip Torr; | To address these issues, we propose a deep fusion network architecture (FusionNet) with a unique voxel-based mini-PointNet point cloud representation and a new feature aggregation module (fusion module) for large-scale 3D semantic segmentation. |
1093 | Deep Material Recognition in Light-Fields via Disentanglement of Spatial and Angular Information | Bichuan Guo; Jiangtao Wen; Yuxing Han; | In this paper, we propose an approach that achieves decoupling of angular and spatial information by establishing correspondences in the angular domain, then employs regularization to enforce a rotational invariance. |
1094 | Dual Adversarial Network for Deep Active Learning | Shuo Wang; Yuexiang Li; Kai Ma; Ruhui Ma; Haibing Guan; Yefeng Zheng; | In this paper, we investigate the overlapping problem of recent uncertainty-based approaches and propose to alleviate the issue by taking representativeness into consideration. |
1095 | Fully Convolutional Networks for Continuous Sign Language Recognition | Ka Leong Cheng; Zhaoyang Yang; Qifeng Chen; Yu-Wing Tai; | In this paper, we propose a fully convolutional network (FCN) for online SLR to concurrently learn spatial and temporal features from weakly annotated video sequences with only sentence-level annotations given. |
1096 | Self-adapting confidence estimation for stereo | Matteo Poggi; Filippo Aleotti; Fabio Tosi; Giulio Zaccaroni; Stefano Mattoccia; | In this paper, we propose a flexible and lightweight solution enabling self-adapting confidence estimation agnostic to the stereo algorithm or network. |
1097 | Deep Surface Normal Estimation on the 2-Sphere with Confidence Guided Semantic Attention | Quewei Li; Jie Guo; Yang Fei; Qinyu Tang; Wenxiu Sun; Jin Zeng; Yanwen Guo; | We propose a deep convolutional neural network (CNN) to estimate surface normal from a single color image accompanied with a low-quality depth channel. |
1098 | AutoSTR: Efficient Backbone Search for Scene Text Recognition | Hui Zhang; Quanming Yao; Mingkun Yang; Yongchao Xu; Xiang Bai; | In this work, inspired by the success of neural architecture search (NAS), we propose automated STR (AutoSTR), which can address the above issue by searching data-dependent backbones. |
1099 | Mitigating Embedding and Class Assignment Mismatch in Unsupervised Image Classification | Sungwon Han; Sungwon Park; Sungkyu Park; Sundong Kim; Meeyoung Cha; | To address this limitation, we propose a novel two-stage algorithm in which an embedding module for pretraining precedes a refining module that concurrently performs embedding and class assignment. |
1100 | Adversarial Training with Bi-directional Likelihood Regularization for Visual Classification | Weitao Wan; Jiansheng Chen; Ming-Hsuan Yang; | We propose that this problem can be solved by explicitly modeling the deep feature distribution, for example as a Gaussian Mixture, and then properly introducing the likelihood regularization into the loss function. |
1101 | Faster AutoAugment: Learning Augmentation Strategies Using Backpropagation | Ryuichiro Hataya; Zdenek Jan; Kazuki Yoshizoe; Hideki Nakayama; | In this paper, we propose a differentiable policy search pipeline for data augmentation, which is much faster than previous methods. |
1102 | Hand-Transformer: Non-Autoregressive Structured Modeling for 3D Hand Pose Estimation | Lin Huang; Jianchao Tan; Ji Liu; Junsong Yuan; | To borrow wisdom from this structured learning framework while avoiding the sequential modeling for hand pose, taking a 3D point set as input, we propose to leverage the Transformer architecture with a novel non-autoregressive structured decoding mechanism. |
1103 | Boundary-Aware Cascade Networks for Temporal Action Segmentation | Zhenzhi Wang; Ziteng Gao; Limin Wang; Zhifeng Li; Gangshan Wu; | To address these problems, we present a new boundary-aware cascade network by introducing two novel components. |
1104 | Towards Content-Independent Multi-Reference Super-Resolution: Adaptive Pattern Matching and Feature Aggregation | Xu Yan; Weibing Zhao; Kun Yuan; Ruimao Zhang; Zhen Li; Shuguang Cui; | This work investigates a novel multi-reference based super-resolution problem by proposing a Content Independent Multi-Reference Super-Resolution (CIMR-SR) model, which is able to adaptively match the visual pattern between references and target image in the low resolution and enhance the feature representation of the target image in the higher resolution. |
1105 | Inference Graphs for CNN Interpretation | Yael Konforti; Alon Shpigler; Boaz Lerner; Aharon Bar-Hillel; | We propose to model the network hidden layers activity using probabilistic models. |
1106 | An End-to-End OCR Text Re-organization Sequence Learning for Rich-text Detail Image Comprehension | Liangcheng Li; Feiyu Gao; Jiajun Bu; Yongpan Wang; Zhi Yu; Qi Zheng; | To tackle the above problems, we propose a novel end-to-end OCR text reorganizing model. |
1107 | Improving Query Efficiency of Black-box Adversarial Attack | Yang Bai; Yuyuan Zeng; Yong Jiang; Yisen Wang; Shu-Tao Xia; Weiwei Guo; | Therefore, in order to improve query efficiency, we explore the distribution of adversarial examples around benign inputs with the help of image structure information characterized by a Neural Process, and propose a Neural Process based black-box adversarial attack (NP-Attack) in this paper. |
1108 | Self-similarity Student for Partial Label Histopathology Image Segmentation | Hsien-Tzu Cheng; Chun-Fu Yeh; Po-Chen Kuo; Andy Wei; Keng-Chi Liu; Mong-Chi Ko; Kuan-Hua Chao; Yu-Ching Peng; Tyng-Luh Liu; | To learn from these patches, we propose Self-similarity Student, combining teacher-student model paradigm with similarity learning. |
1109 | BioMetricNet: deep unconstrained face verification through learning of metrics regularized onto Gaussian distributions | Arslan Ali; Matteo Testa; Tiziano Bianchi; Enrico Magli; | We present BioMetricNet: a novel framework for deep unconstrained face verification which learns a regularized metric to compare facial features. |
1110 | A Decoupled Learning Scheme for Real-world Burst Denoising from Raw Images | Zhetong Liang; Shi Guo; Hong Gu; Huaqi Zhang; Lei Zhang; | In this paper, a novel multi-frame CNN model is carefully designed, which decouples the learning of motion from the learning of noise statistics. |
1111 | Global-and-Local Relative Position Embedding for Unsupervised Video Summarization | Yunjae Jung; Donghyeon Cho; Sanghyun Woo; In So Kweon; | In this paper, we therefore present a novel input decomposition strategy, which samples the input both globally and locally. |
1112 | Real-World Blur Dataset for Learning and Benchmarking Deblurring Algorithms | Jaesung Rim; Haeyun Lee; Jucheol Won; Sunghyun Cho; | In this work, we present a large-scale dataset of real-world blurred images and ground truth sharp images for learning and benchmarking single image deblurring methods. |
1113 | SPARK: Spatial-aware Online Incremental Attack Against Visual Tracking | Qing Guo; Xiaofei Xie; Felix Juefei-Xu; Lei Ma; Zhongguo Li; Wanli Xue; Wei Feng; Yang Liu; | In this paper, we identify a new task for the adversarial attack to visual tracking: online generating imperceptible perturbations that mislead trackers along with an incorrect (Untargeted Attack, UA) or specified trajectory (Targeted Attack, TA). |
1114 | CenterNet Heatmap Propagation for Real-time Video Object Detection | Zhujun Xu; Emir Hrustic; Damien Vivet; | In this work, we introduce a method based on a one-stage detector called CenterNet. |
1115 | Hierarchical Dynamic Filtering Network for RGB-D Salient Object Detection | Youwei Pang; Lihe Zhang; Xiaoqi Zhao; Huchuan Lu; | In the end, we implement a kind of more flexible and efficient multi-scale cross-modal feature processing, i.e. dynamic dilated pyramid module. |
1116 | SOLAR: Second-Order Loss and Attention for Image Retrieval | Tony Ng; Vassileios Balntas; Yurun Tian; Krystian Mikolajczyk; | In this work, we explore two second-order components. One is focused on second-order spatial information to increase the performance of image descriptors, both local and global. It is used to re-weight feature maps, and thus emphasise salient image locations that are subsequently used for description. The second component is concerned with a second-order similarity (SOS) loss, that we extend to global descriptors for image retrieval, and is used to enhance the triplet loss with hard-negative mining. |
1117 | Fixing Localization Errors to Improve Image Classification | Guolei Sun; Salman Khan; Wen Li; Hisham Cholakkal; Fahad Shahbaz Khan; Luc Van Gool; | In this work, we explore a new direction towards the possible use of CAM in deep network learning process. |
1118 | PatchPerPix for Instance Segmentation | Lisa Mais; Peter Hirsch and Dagmar Kainmueller; | In this paper we present a novel method for proposal free instance segmentation that can handle sophisticated object shapes that span large parts of an image and form dense object clusters with crossovers. |
1119 | Attend and Segment: Attention Guided Active Semantic Segmentation | Soroush Seifi; Tinne Tuytelaars; | In this paper we propose a method to gradually segment a scene given a sequence of partial observations. |
1120 | Accelerating CNN Training by Pruning Activation Gradients | Xucheng Ye; Pengcheng Dai; Junyu Luo; Xin Guo; Yingjie Qi; Jianlei Yang; Yiran Chen; | Hence, we consider pruning these very small gradients randomly to accelerate CNN training according to the statistical distribution of activation gradients. |
1121 | Global and Local Enhancement Networks for Paired and Unpaired Image Enhancement | Han-Ul Kim; Young Jun Koh; Chang-Su Kim; | A novel approach for paired and unpaired image enhancement is proposed in this work. |
1122 | Probabilistic Anchor Assignment with IoU Prediction for Object Detection | Kang Kim; Hee Seok Lee; | In this paper we propose a novel anchor assignment strategy that adaptively separates anchors into positive and negative samples for a ground truth bounding box according to the model’s learning status such that it is able to reason the separation in a probabilistic manner. |
1123 | Eyeglasses 3D shape reconstruction from a single face image | Yating Wang; Quan Wang; Feng Xu; | In this paper, we present an automatic system that recovers the 3D shape of eyeglasses from a single face image with an arbitrary head pose. |
1124 | Temporal Complementary Learning for Video Person Re-Identification | Ruibing Hou; Hong Chang; Bingpeng Ma; Shiguang Shan; Xilin Chen; | This paper proposes a Temporal Complementary Learning Network that extracts complementary features of consecutive video frames for video person re-identification. |
1125 | HoughNet: Integrating near and long-range evidence for bottom-up object detection | Nermin Samet; Samet Hicsonmez; Emre Akbas; | This paper presents HoughNet, a one-stage, anchor-free, voting-based, bottom-up object detection method. |
1126 | Graph Wasserstein Correlation Analysis for Movie Retrieval | Xueya Zhang; Tong Zhang; Xiaobin Hong; Zhen Cui; Jian Yang; | In this work, we propose Graph Wasserstein Correlation Analysis (GWCA) to deal with the core issue therein, i.e, cross heterogeneous graph comparison. |
1127 | Context-Aware RCNN: A Baseline for Action Detection in Videos | Jianchao Wu; Zhanghui Kuang; Limin Wang; Wayne Zhang; Gangshan Wu; | Thus, we revisit RCNN for actor-centric action recognition via cropping and resizing image patches around actors before feature extraction with I3D deep network. |
1128 | Full-Time Monocular Road Detection Using Zero-Distribution Prior of Angle of Polarization | Ning Li; Yongqiang Zhao; Quan Pan; Seong G. Kong; Jonathan Cheung-Wai Chan; | This paper presents a road detection technique based on long-wave infrared (LWIR) polarization imaging for autonomous navigation regardless of illumination conditions, day and night. |
1129 | A Flexible Recurrent Residual Pyramid Network for Video Frame Interpolation | Haoxian Zhang; Yang Zhao; Ronggang Wang; | Inspired by classical pyramid energy minimization optical flow algorithms, this paper proposes a recurrent residual pyramid network (RRPN) for video frame interpolation. |
1130 | Learning Enriched Features for Real Image Restoration and Enhancement | Syed Waqas Zamir; Aditya Arora; Salman Khan; Munawar Hayat; Fahad Shahbaz Khan; Ming-Hsuan Yang; Ling Shao; | In this paper, we present an architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network and receiving strong contextual information from the low-resolution representations. |
1131 | Detail Preserved Point Cloud Completion via Separated Feature Aggregation | Wenxiao Zhang; Qingan Yan; Chunxia Xiao; | In this work, instead of using a global feature to recover the whole complete surface, we explore multi-level features by hierarchical feature learning and represent the existing-part and the missing-part respectively. |
1132 | LabelEnc: A New Intermediate Supervision Method for Object Detection | Miao Hao; Yitao Liu; Xiangyu Zhang; Jian Sun; | In this paper we propose a new intermediate supervision method, named LabelEnc, to boost the training of object detection systems. |
1133 | Unsupervised Learning of Category-Specific Symmetric 3D Keypoints from Point Sets | Clara Fernandez-Labrador; Ajad Chhatkuli; Danda Pani Paudel; Jose J. Guerrero; Cédric Demonceaux; Luc Van Gool; | This paper aims at learning such 3D keypoints, in an unsupervised manner, using a collection of misaligned 3D point clouds of objects from an unknown category. |
1134 | PAMS: Quantized Super-Resolution via Parameterized Max Scale | Huixia Li; Chenqian Yan; Shaohui Lin; Xiawu Zheng; Baochang Zhang; Fan Yang; Rongrong Ji; | To address these two issues, we propose a new quantization scheme termed PArameterized Max Scale (PAMS), which applies the trainable truncated parameter to explore the upper bound of the quantization range adaptively. |
1135 | SSN: Shape Signature Networks for Multi-class Object Detection from Point Clouds | Xinge Zhu Yuexin Ma Tai Wang Yan Xu Jianping Shi Dahua Lin; | In this paper, we propose a novel 3D shape signature to explore the shape information from point clouds. |
1136 | OID: Outlier Identifying and Discarding in Blind Image Deblurring | Liang Chen; Faming Fang; Jiawei Zhang; Jun Liu; Guixu Zhang; | To address these problems,this paper develops a simple yet effective Outlier Identifying and Discarding (OID) method, which alleviates limitations in existing Maximum A Posteriori (MAP)-based deblurring models when significant outliers are presented. |
1137 | Few-Shot Single-View 3-D Object Reconstruction with Compositional Priors | Mateusz Michalkiewicz; Sarah Parisot; Stavros Tsogkas; Mahsa Baktashmotlagh; Anders Eriksson; Eugene Belilovsky; | In this work we demonstrate experimentally that naive baselines do not apply when the goal is to learn to reconstruct novel objects using very few examples, and that in a mph{few-shot} learning setting, the network must learn concepts that can be applied to new categories, avoiding rote memorization. |
1138 | Enhanced Sparse Model for Blind Deblurring | Liang Chen; Faming Fang; Shen Lei; Fang Li; Guixu Zhang; | In this paper, we develop a new term to better fit the complex natural noise. |
1139 | SumGraph: Video Summarization via Recursive Graph Modeling | Jungin Park; Jiyoung Lee; Ig-Jae Kim; Kwanghoon Sohn; | We propose recursive graph modeling networks for video summarization, termed SumGraph, to represent a relation graph, where frames are regarded as nodes and nodes are connected by semantic relationships among frames. |
1140 | Feature Normalized Knowledge Distillation for Image Classification | Kunran Xu; Lai Rui; Yishi Li; Lin Gu; | From this perspective, we systematically analyze the distillation mechanism and demonstrate that the L2-norm of the feature in penultimate layer would be too large under the influence of label noise, and the temperature T in KD could be regarded as a correction factor for L2-norm to suppress the impact of noise. |
1141 | A Metric Learning Reality Check | Kevin Musgrave; Serge Belongie; Ser-Nam Lim; | Deep metric learning papers from the past four years have consistently claimed great advances in accuracy, often more than doubling the performance of decade-old methods. In this paper, we take a closer look at the field to see if this is actually true. |
1142 | FTL: A universal framework for training low-bit DNNs via Feature Transfer | Kunyuan Du; Ya Zhang; Haibing Guan; Qi Tian; Shenggan Cheng; James Lin; | Here we introduce a novel feature-based knowledge transfer framework, which utilizes a 32-bit DNN to guide the training of a low-bit DNN via feature maps. |
1143 | XingGAN for Person Image Generation | Hao Tang; Song Bai; Li Zhang; Philip H.S. Torr; Nicu Sebe; | We propose a novel Generative Adversarial Network (XingGAN or CrossingGAN) for person image generation tasks, i.e., translating the pose of a given person to a desired one. |
1144 | GATCluster: Self-Supervised Gaussian-Attention Network for Image Clustering | Chuang Niu; Jun Zhang; Ge Wang; Jimin Liang; | We propose a self-supervised Gaussian ATtention network for image Clustering (GATCluster). |
1145 | VCNet: A Robust Approach to Blind Image Inpainting | Yi Wang; Ying-Cong Chen; Xin Tao; Jiaya Jia; | In this paper, we relax the assumption by defining a new blind inpainting setting, making training a blind inpainting neural system robust against various unknown missing region patterns. |
1146 | Learning to Predict Context-adaptive Convolution for Semantic Segmentation | Jianbo Liu; Junjun He; Yu Qiao; Jimmy S. Ren; Hongsheng Li; | In this paper, we propose a Context-adaptive Convolution Network (CaC-Net) to predict a spatially-varying feature weighting vector for each spatial location of the semantic feature maps. |
1147 | EfficientFCN: Holistically-guided Decoding for Semantic Segmentation | Jianbo Liu; Junjun He; Jiawei Zhang; Jimmy S. Ren; Hongsheng Li; | In this paper, we propose the EfficientFCN, whose backbone is a common ImageNet pretrained network without any dilated convolution. |
1148 | GroSS: Group-Size Series Decomposition for Grouped Architecture Search | Henry Howard-Jenkins; Yiwen Li; Victor Adrian Prisacariu; | We present a novel approach which is able to explore the configuration of grouped convolutions within neural networks. |
1149 | Efficient Adversarial Attacks for Visual Object Tracking | Siyuan Liang; Xingxing Wei; Siyuan Yao; Xiaochun Cao; | We present an end-to-end network FAN (Fast Attack Network) that uses a novel drift loss combined with the embedded feature loss to attack the Siamese network based trackers. |
1150 | Globally-Optimal Event Camera Motion Estimation | Xin Peng; Yifu Wang; Ling Gao; Laurent Kneip; | The present paper looks at fronto-parallel motion estimation of an event camera. |
1151 | Weakly-supervised Learning of Human Dynamics | Petrissa Zell; Bodo Rosenhahn; Bastian Wandt; | This paper proposes a weakly-supervised learning framework for dynamics estimation from human motion. |
1152 | Journey Towards Tiny Perceptual Super-Resolution | Royson Lee; ?ukasz Dudziak; Mohamed Abdelfattah; Stylianos I. Venieris; Hyeji Kim; Hongkai Wen; Nicholas D. Lane; | In this work, we propose a neural architecture search (NAS) approach that integrates NAS and generative adversarial networks (GANs) with recent advances in perceptual SR and pushes the efficiency of small perceptual SR models to facilitate on-device execution. |
1153 | What makes fake images detectable? Understanding properties that generalize | Lucy Chai; David Bau; Ser-Nam Lim; Phillip Isola; | We seek to understand what properties of these fake images make them detectable and identify what generalizes across different model architectures, datasets, and variations in training. |
1154 | Embedding Propagation: Smoother Manifold for Few-Shot Classification | Pau Rodríguez; Issam Laradji; Alexandre Drouin; Alexandre Lacoste; | In this work, we propose to use embedding propagation as an unsupervised non-parametric regularizer for manifold smoothing in few-shot classification. |
1155 | Category Level Object Pose Estimation via Neural Analysis-by-Synthesis | Xu Chen; Zijian Dong; Jie Song; Andreas Geiger; Otmar Hilliges; | In this paper we combine a gradient-based fitting procedure with a parametric neural image synthesis module that is capable of implicitly representing the appearance, shape and pose of entire object categories, thus rendering the need for explicit CAD models per object instance unnecessary. |
1156 | High-Fidelity Synthesis with Disentangled Representation | Wonkwang Lee; Donggyun Kim; Seunghoon Hong; Honglak Lee; | We propose an Information-Distillation Generative Adversarial Network (ID-GAN), a simple yet generic framework that can easily incorporate the existing state-of-the-art models for both disentanglement learning and high-fidelity synthesis. |
1157 | PL?P – Point-line Minimal Problems under Partial Visibility in Three Views | Timothy Duff; Kathlén Kohn; Anton Leykin; Tomas Pajdla; | We present a complete classification of minimal problems for generic arrangements of points and lines in space observed partially by three calibrated perspective cameras when each line is incident to at most one point. |
1158 | Prediction and Recovery for Adaptive Low-Resolution Person Re-Identification | Ke Han; Yan Huang; Zerui Chen; Liang Wang; Tieniu Tan; | In this paper, we propose a novel Prediction, Recovery and Identification (PRI) model for LR re-id, which adaptively recovers missing details by predicting a preferable scale factor based on the image content. |
1159 | Learning Canonical Representations for Scene Graph to Image Generation | Roei Herzig; Amir Bar; Huijuan Xu; Gal Chechik; Trevor Darrell; Amir Globerson; | In this work, we show that one limitation of current methods is their inability to capture semantic equivalence in graphs. |
1160 | Adversarial Robustness on In- and Out-Distribution Improves Explainability | Maximilian Augustin; Alexander Meinke; Matthias Hein; | In this work we propose RATIO, a training procedure for Robustness via Adversarial Training on In- and Out-distribution, which leads to robust models with reliable and robust confidence estimates on the out-distribution. |
1161 | Deformable Style Transfer | Sunnie S. Y. Kim; Nicholas Kolkin; Jason Salavon; Gregory Shakhnarovich; | We propose deformable style transfer (DST), an optimization-based approach that jointly stylizes the texture and geometry of a content image to better match a style image. |
1162 | Aligning Videos in Space and Time | Senthil Purushwalkam; Tian Ye; Saurabh Gupta; Abhinav Gupta; | In this paper, we focus on the task of extracting visual correspondences across videos. |
1163 | Neural Wireframe Renderer: Learning Wireframe to Image Translations | Yuan Xue; Zihan Zhou; Xiaolei Huang; | In this paper, we bridge the information gap by generating photo-realistic rendering of indoor scenes from wireframe models in an image translation framework. |
1164 | RBF-Softmax: Learning Deep Representative Prototypes with Radial Basis Function Softmax | Xiao Zhang; Rui Zhao; Yu Qiao; Hongsheng Li; | To address this problem, this paper introduces a novel Radial Basis Function (RBF) distances to replace the commonly used inner products in the softmax loss function, such that it can adaptively assign losses to regularize the intra-class and inter-class distances by reshaping the relative differences, and thus creating more representative prototypes of classes to improve optimization. |
1165 | Testing the Safety of Self-driving Vehicles by Simulating Perception and Prediction | Kelvin Wong; Qiang Zhang; Ming Liang; Bin Yang; Renjie Liao; Abbas Sadat; Raquel Urtasun; | We present a novel method for testing the safety of self-driving vehicles in simulation. |
1166 | Determining the Relevance of Features for Deep Neural Networks | Christian Reimers; Jakob Runge; Joachim Denzler; | In this work, we present a novel method to identify whether a specific feature is relevant to a classifier’s decision or not. |
1167 | Weakly Supervised Semantic Segmentation with Boundary Exploration | Liyi Chen; Weiwei Wu; Chenchen Fu; Xiao Han; Yuntao Zhang; | To obtain semantic segmentation under weak supervision, this paper presents a simple yet effective approach based on the idea of explicitly exploring object boundaries from training images to keep coincidence of segmentation and boundaries. |
1168 | GANHopper: Multi-Hop GAN for Unsupervised Image-to-Image Translation | Wallace Lira; Johannes Merz; Daniel Ritchie; Daniel Cohen-Or; Hao Zhang; | We introduce GANHopper, an unsupervised image-to-image translation network that transforms images gradually between two domains, through multiple hops. |
1169 | DOPE: Distillation Of Part Experts for whole-body 3D pose estimation in the wild | Philippe Weinzaepfel; Romain Brégier; Hadrien Combaluzier; Vincent Leroy; Grégory Rogez; | We introduce DOPE, the first method to detect and estimate whole-body 3D human poses, including bodies, hands and faces, in the wild. |
1170 | Multi-view adaptive graph convolutions for graph classification | Nikolas Adaloglou; Nicholas Vretos; Petros Daras; | In this paper, a novel multi-view methodology for graph-based neural networks is proposed. |
1171 | Instance Adaptive Self-Training for Unsupervised Domain Adaptation | Ke Mei; Chuang Zhu; Jiaqi Zou; Shanghang Zhang; | In this paper, we propose an instance adaptive self-training framework for UDA on the task of semantic segmentation. |
1172 | Weight Decay Scheduling and Knowledge Distillation for Active Learning | Juseung Yun; Byungjoo Kim; Junmo Kim; | However,in this paper, we focus on the data-incremental nature of active learning, and propose a method for properly tuning the weight decay as the amount of data increases. |
1173 | HMQ: Hardware Friendly Mixed Precision Quantization Block for CNNs | Hai Victor Habi; Roy H. Jennings; Arnon Netzer; | In this work, we introduce the Hardware Friendly Mixed Precision Quantization Block (HMQ) in order to meet this requirement. |
1174 | Truncated Inference for Latent Variable Optimization Problems: Application to Robust Estimation and Learning | Christopher Zach; Huu Le; | We aim to remove the need to maintain the latent variables and propose two formally justified methods, that dynamically adapt the required accuracy of latent variable inference. |
1175 | Geometry Constrained Weakly Supervised Object Localization | Weizeng Lu; Xi Jia; Weicheng Xie; Linlin Shen; Yicong Zhou; Jinming Duan; | We propose a geometry constrained network, termed GCNet, for weakly supervised object localization (WSOL). |
1176 | Duality Diagram Similarity: a generic framework for initialization selection in task transfer learning | Kshitij Dwivedi; Jiahui Huang; Radoslaw Martin Cichy; Gemma Roig; | In this paper, we tackle an open research question in transfer learning, which is selecting a model initialization to achieve high performance on a new task, given several pre-trained models. |
1177 | OneGAN: Simultaneous Unsupervised Learning of Conditional Image Generation, Foreground Segmentation, and Fine-Grained Clustering | Yaniv Benny; Lior Wolf; | We present a method for simultaneously learning, in an unsupervised manner, (i) a conditional image generator, (ii) foreground extraction and segmentation, (iii) clustering into a two-level class hierarchy, and (iv) object removal and background completion, all done without any use of annotation. |
1178 | Mining self-similarity: Label super-resolution with epitomic representations | Nikolay Malkin; Anthony Ortiz; Nebojsa Jojic; | We derive a new training algorithm for epitomes which allows, for the first time, learning from very large data sets and derive a label super-resolution algorithm as a statistical inference algorithm over epitomic representations. |
1179 | AE-OT-GAN: Training GANs from data specific latent distribution | Dongsheng An; Yang Guo; Min Zhang; Xin Qi; Na Lei; Xianfang Gu; | In this paper, we propose the AE-OT-GAN model to utilize the advantages of the both models: generate high quality images and at the same time overcome the mode collapse/mixture problems. |
1180 | Null-sampling for Interpretable and Fair Representations | Thomas Kehrenberg; Myles Bartlett; Oliver Thomas; Novi Quadrianto; | We propose to learn invariant representations, in the data domain, to achieve interpretability in algorithmic fairness. |
1181 | Guiding Monocular Depth Estimation Using Depth-Attention Volume | Lam Huynh; Phong Nguyen-Ha; Jiri Matas; Esa Rahtu; Janne Heikkilä | In this paper, we propose guiding depth estimation to favor planar structures that are ubiquitous especially in indoor environments. |
1182 | Tracking Emerges by Looking Around Static Scenes, with Neural 3D Mapping | Adam W. Harley; Shrinidhi Kowshika Lakshmikanth; Paul Schydlo; Katerina Fragkiadaki; | We propose to leverage multiview data of static points in arbitrary scenes (static or dynamic), to learn a neural 3D mapping module which produces features that are correspondable across time. |
1183 | Boosting Weakly Supervised Object Detection with Progressive Knowledge Transfer | Yuanyi Zhong; Jianfeng Wang; Jian Peng; Lei Zhang; | In this paper, we propose an effective knowledge transfer framework to boost the weakly supervised object detection accuracy with the help of an external fully-annotated source dataset, whose categories may not overlap with the target domain. |
1184 | BézierSketch: A generative model for scalable vector sketches | Ayan Das; Yongxin Yang; Timothy Hospedales; Tao Xiang; Yi-Zhe Song; | In this paper we present BézierSketch, a novel generative model for fully vector sketches that are automatically scalable and high-resolution. |
1185 | Semantic Relation Preserving Knowledge Distillation for Image-to-Image Translation | Zeqi Li; Ruowei Jiang,; Parham Aarabi; | In this work, we propose a novel method to address this problem by applying knowledge distillation together with distillation of a semantic relation preserving matrix. |
1186 | Domain Adaptation Through Task Distillation | Brady Zhou; Nimit Kalra; Philipp Krähenbühl; | We use these recognition datasets to link up a source and target domain to transfer models between them in a task distillation framework. |
1187 | PatchAttack: A Black-box Texture-based Attack with Reinforcement Learning | Chenglin Yang; Adam Kortylewski; Cihang Xie; Yinzhi Cao; Alan Yuille; | Our proposed PatchAttack is query efficient and can break models for both targeted and non-targeted attacks. |
1188 | More Classifiers, Less Forgetting: A Generic Multi-classifier Paradigm for Incremental Learning | Yu Liu; Sarah Parisot; Gregory Slabaugh; Xu Jia; Ales Leonardis; Tinne Tuytelaars; | Since those regularization strategies are mostly associated with classifier outputs, we propose a MUlti-Classifier (MUC) incremental learning paradigm that integrates an ensemble of auxiliary classifiers to estimate more effective regularization constraints. |
1189 | Extending and Analyzing Self-Supervised Learning Across Domains | Bram Wallace; Bharath Hariharan; | We discover, among other findings, that Rotation is the most semantically meaningful task, while much of the performance of Jigsaw is attributable to the nature of its induced distribution rather than semantic understanding. |
1190 | Multi-Source Open-Set Deep Adversarial Domain Adaptation | Sayan Rakshit; Dipesh Tamboli; Pragati Shuddhodhan Meshram; Biplab Banerjee; Gemma Roig; Subhasis Chaudhuri; | As a remedy, we propose a novel adversarial learning-driven approach to deal with the MS-OSDA setup. |
1191 | Neural Batch Sampling with Reinforcement Learning for Semi-Supervised Anomaly Detection | Wen-Hsuan Chu; Kris M. Kitani; | In particular, we propose a novel semi-supervised learning algorithm for anomaly detection and segmentation using an anomaly classifier that uses as input the extit{loss profile} of a data sample processed through an autoencoder. |
1192 | LEMMA: A Multi-view Dataset for LEarning Multi-agent Multi-task Activities | Baoxiong Jia; Yixin Chen; Siyuan Huang; Yixin Zhu; Song-Chun Zhu; | We introduce the LEMMA dataset to provide a single home to address these missing dimensions with carefully designed settings, wherein the numbers of tasks and agents vary to highlight different learning objectives. |
1193 | Teaching Cameras to Feel: Estimating Tactile Physical Properties of Surfaces From Images | Matthew Purri; Kristin Dana; | In this work, we introduce the challenging task of estimating a set of tactile physical properties from visual information. |
1194 | Accurate Optimization of Weighted Nuclear Norm for Non-Rigid Structure from Motion | José Pedro Iglesias; Carl Olsson; Marcus Valtonen Örnhag; | In this paper we show that more accurate results can in many cases beachieved with 2nd order methods. |
1195 | Proposal-based Video Completion | Yuan-Ting Hu; Heng Wang; Nicolas Ballas; Kristen Grauman; Alexander G. Schwing; | In contrast, in this paper, we propose a video inpainting algorithm based on proposals: we use 3D convolutions to obtain an initial inpainting estimate which is subsequently refined by fusing a generated set of proposals. |
1196 | HGNet: Hybrid Generative Network for Zero-shot Domain Adaptation | Haifeng Xia; Zhengming Ding; | In this paper, we propose a novel algorithm, Hybrid Generative Network (HGNet) for Zero-shot Domain Adaptation, which embeds an adaptive feature separation (AFS) module into generative architecture. |
1197 | Beyond Monocular Deraining: Stereo Image Deraining via Semantic Understanding | Kaihao Zhang; Wenhan Luo; Wenqi Ren; Jingwen Wang Fang Zhao; Lin Ma ; Hongdong Li; | In this paper, we present a Paired Rain Removal Network (PRRNet), which exploits both stereo images and semantic information. |
1198 | DBQ: A Differentiable Branch Quantizer for Lightweight Deep Neural Networks | Hassan Dbouk; Hetul Sanghvi; Mahesh Mehendale; Naresh Shanbhag; | To this end, we present a novel fully differentiable non-uniform quantizer that can be seamlessly mapped onto efficient ternary-based dot product engines. |
1199 | All at Once: Temporally Adaptive Multi-Frame Interpolation with Advanced Motion Modeling | Zhixiang Chi; Rasoul Mohammadi Nasiri; Zheng Liu; Juwei Lu; Jin Tang ; Konstantinos N Plataniotis; | Departing from the state-of-the-art, this work introduces a true multi-frame interpolator. |
1200 | A Broader Study of Cross-Domain Few-Shot Learning | Yunhui Guo; Noel C. Codella; Leonid Karlinsky; James V. Codella; John R. Smith; Kate Saenko; Tajana Rosing; Rogerio Feris; | In this paper, we propose the Broader Study of Cross-Domain Few-Shot Learning (BSCD-FSL) benchmark, consisting of image data from a diverse assortment of image acquisition methods. |
1201 | Practical Poisoning Attacks on Neural Networks | Junfeng Guo; Cong Liu; | This paper presents a new, practical targeted poisoning attack method on neural networks in vision domain, namely BlackCard. |
1202 | Unsupervised Domain Adaptation in the Dissimilarity Space for Person Re-identification | Djebril Mekhazni; Amran Bhuiyan; George Ekladious; Eric Granger; | In this paper, we propose a novel Dissimilarity-based Maximum Mean Discrepancy (D-MMD) loss for aligning pair-wise distances that can be optimized via gradient descent using relatively small batch sizes. |
1203 | Learn distributed GAN with Temporary Discriminators | Hui Qu; Yikai Zhang; Qi Chang; Zhennan Yan; Chao Chen; Dimitris Metaxas; | In this work, we propose a method for training distributed GAN with sequential temporary discriminators. |
1204 | SemifreddoNets: Partially Frozen Neural Networks for Efficient Computer Vision Systems | Leo F Isikdogan; Bhavin V Nayak; Chyuan-Tyng Wu; Joao Peralta Moreira ; Sushma Rao; Gilad Michael; | We propose a system comprised of fixed-topology neural networks having partially frozen weights, named SemifreddoNets. |
1205 | Improving Adversarial Robustness by Enforcing Local and Global Compactness | Anh Bui; Trung Le; He Zhao; Paul Montague; Olivier deVel; Tamas Abraham; Dinh Phung; | In this work, based on an observation from a previous study that the representations of a clean data example and its adversarial examples become more divergent in higher layers of a deep neural net, we propose the Adversary Divergence Reduction Network which enforces local/global compactness and the clustering assumption over an intermediate layer of a deep neural network. |
1206 | TopoAL: An Adversarial Learning Approach for Topology-Aware Road Segmentation | Subeesh Vasu; Mateusz Kozinski; Leonardo Citraro; and Pascal Fua; | To address this issue, we introduce an Adversarial Learning (AL) strategy tailored for our purposes. |
1207 | Channel selection using Gumbel Softmax | Charles Herrmann; Richard Strong Bowen; Ramin Zabih; | We propose a single end-to-end framework that can improve inference efficiency in both settings. |
1208 | Exploiting Temporal Coherence for Self-Supervised One-shot Video Re-identification | Dripta S. Raychaudhuri; Amit K. Roy-Chowdhury; | In this paper, we propose a new framework named Temporal Consistency Progressive Learning, which uses temporal coherence as a novel self-supervised auxiliary task in the one-shot learning paradigm to capture such relationships amongst the unlabeled tracklets. |
1209 | An Efficient Training Framework for Reversible Neural Architectures | Zixuan Jiang; Keren Zhu; Mingjie Liu; Jiaqi Gu; David Z. Pan; | In this work, we formulate the decision problem for reversible operators with training time as the objective function and memory usage as the constraint. |
1210 | Box2Seg: Attention Weighted Loss and Discriminative Feature Learning for Weakly Supervised Segmentation | Viveka Kulharia; Siddhartha Chandra; Amit Agrawal; Philip Torr; Ambrish Tyagi; | We propose a weakly supervised approach to semantic segmentation using bounding box annotations. |
1211 | FreeCam3D: Snapshot Structured Light 3D with Freely-Moving Cameras | Yicheng Wu; Vivek Boominathan; Xuan Zhao; Jacob T. Robinson; Hiroshi Kawasaki; Aswin Sankaranarayanan; Ashok Veeraraghavan; | We propose a freeform structured light system that does not rigidly constrain camera(s) to the projector. |
1212 | One-Pixel Signature: Characterizing CNN Models for Backdoor Detection | Shanjiaoyang Huang; Weiqi Peng; Zhiwei Jia; Zhuowen Tu; | We tackle the convolution neural networks (CNNs) backdoor detection problem by proposing a new representation called one-pixel signature. |
1213 | Learning to Transfer Learn: Reinforcement Learning-Based Selection for Adaptive Transfer Learning | Linchao Zhu; Sercan . Ar?k; Yi Yang; Tomas Pfister; | We propose a novel adaptive transfer learning framework, learning to transfer learn (L2TL), to improve performance on a target dataset by careful extraction of the related information from a source dataset. |
1214 | Structure-Aware Generation Network for Recipe Generation from Images | Hao Wang; Guosheng Lin; Steven C. H. Hoi; Chunyan Miao; | In this paper, we are interested in automatically generating cooking instructions for food. |
1215 | A Simple and Effective Framework for Pairwise Deep Metric Learning | Qi Qi; Yan Yan; Zixuan Wu; Xiaoyu Wang; Tianbao Yang; | In this paper, we cast DML as a simple pairwise binary classification problem that classifies a pair of examples as similar or dissimilar. |
1216 | Meta-rPPG: Remote Heart Rate Estimation Using a Transductive Meta-Learner | Eugene Lee; Evan Chen; Chen-Yi Lee; | To cope with the unforeseeable distributional changes during deployment, we propose a transductive meta-learner that takes unlabeled samples during testing (deployment) for a self-supervised weight adjustment (also known as transductive inference), providing fast adaptation to the distributional changes. |
1217 | A Recurrent Transformer Network for Novel View Action Synthesis | Kara Marie Schatz; Erik Quintanilla; Shruti Vyas; Yogesh S Rawat; | In this work, we address the problem of synthesizing human actions from novel views. |
1218 | Multi-view Action Recognition using Cross-view Video Prediction | Shruti Vyas; Yogesh S Rawat; Mubarak Shah; | In this work, we address the problem of action recognition in a multi-view environment. |
1219 | Learning Discriminative Feature with CRF for Unsupervised Video Object Segmentation | Mingmin Zhen; Shiwei Li; Lei Zhou; Jiaxiang Shang; Haoan Feng; Tian Fang; Long Quan; | In this paper, we introduce a novel network, called discriminative feature network (DFNet), to address the unsupervised video object segmentation task. |
1220 | SMART: Simultaneous Multi-Agent Recurrent Trajectory Prediction | Sriram N N; Buyu Liu; Francesco Pittaluga; Manmohan Chandraker; | We propose advances that address two key challenges in future trajectory prediction: (i) multimodality in both training data and predictions and (ii) constant time inference regardless of number of agents. |
1221 | Label-Driven Reconstruction for Domain Adaptation in Semantic Segmentation | Jinyu Yang; Weizhi An; Sheng Wang; Xinliang Zhu; Chaochao Yan; Junzhou Huang; | Here, we present an innovative framework, designed to mitigate the image translation bias and align cross-domain features with the same category. |
1222 | Efficient Outdoor 3D Point Cloud Semantic Segmentation for Critical Road Objects and Distributed Contexts | Chi-Chong Wong; Chi-Man Vong; | In this work, we propose a novel neural network model called Attention-based Dynamic Convolution Network with Self-Attention Global Contexts(ADConvnet-SAGC), which i) applies attention mechanism to adaptively focus on the most related neighboring points for learning the point features of 3D objects, especially for small objects with diverse shapes ii) applies self-attention module for efficiently capturing long-range distributed contexts from the input iii) a more reasonable and compact architecture for efficient inference. |
1223 | Attributional Robustness Training using Input-Gradient Spatial Alignment | Mayank Singh; Nupur Kumari; Puneet Mangla; Abhishek Sinha; Vineeth N Balasubramanian; Balaji Krishnamurthy; | In this work, we study the problem of attributional robustness (i.e. models having robust explanations) by showing an upper bound for attributional vulnerability in terms of spatial correlation between the input image and its explanation map. |
1224 | Reducing the Sim-to-Real Gap for Event Cameras | Timo Stoffregen; Cedric Scheerlinck; Davide Scaramuzza; Tom Drummond; Nick Barnes; Lindsay Kleeman; Robert Mahony; | To address this, we present a new extbf{High Quality Frames (HQF)} dataset, containing events and ground truth frames from a DAVIS240C that are well-exposed and minimally motion-blurred. |
1225 | Spatial Geometric Reasoning for Room Layout Estimation via Deep Reinforcement Learning | Liangliang Ren; Yangyang Song; Jiwen Lu; Jie Zhou; | We formulate the problem as a Markov decision process, in which the layout is incrementally adjusted based on the difference between the current layout and the target image, and the policy is learned via deep reinforcement learning. |
1226 | Learning Data Augmentation Strategies for Object Detection | Barret Zoph; Ekin D. Cubuk; Golnaz Ghiasi; Tsung-Yi Lin; Jonathon Shlens; Quoc V. Le; | First, we propose to use AutoAugment [3] to design better data augmentation strategies for object detection because it can address the difficulty of designing them. Second, we use the method to assess the value of data augmentation in object detection and compare it against the value of architecture. |
1227 | DA-NAS: Data Adapted Pruning for Efficient Neural Architecture Search | Xiyang Dai; Dongdong Chen; Mengchen Liu; Yinpeng Chen; Lu Yuan; | In this paper, we present DA-NAS that can directly search the architecture for large-scale target tasks while allowing a large candidate set in a more efficient manner. |
1228 | A Closer Look at Generalisation in RAVEN | Steven Spratley; Krista Ehinger; Tim Miller; | We revise the existing evaluation, and introduce two relational models, Rel-Base and Rel-AIR, that significantly improve this performance. |
1229 | Supervised Edge Attention Network for Accurate Image Instance Segmentation | Xier Chen; Yanchao Lian; Licheng Jiao; Haoran Wang; YanJie Gao; Shi Lingling; | To circumvent this issue, we propose a fully convolutional box head and a supervised edge attention module in mask head. |
1230 | Discriminative Partial Domain Adversarial Network | Jian Hu; Hongya Tuo; Chao Wang; Lingfeng Qiao; Haowen Zhong; Junchi Yan; Zhongliang Jing; Henry Leung; | In this paper, a novel Discriminative Partial Domain Adversarial Network (DPDAN) is developed. |
1231 | Differentiable Programming for Hyperspectral Unmixing using a Physics-based Dispersion Model | John Janiczek; Parth Thaker; Gautam Dasarathy; Christopher S. Edwards ; Philip Christensen; Suren Jayasuriya; | In this paper, spectral variation is considered from a physics-based approach and incorporated into an end-to-end spectral unmixing algorithm via differentiable programming. |
1232 | Deep Cross-species Feature Learning for Animal Face Recognition via Residual Interspecies Equivariant Network | Xiao Shi; Chenxue Yang; Xue Xia; Xiujuan Chai; | In this work, we propose a novel Residual InterSpecies Equivariant Network (RiseNet) to deal with the animal face recognition task with limited training samples. |
1233 | Guidance and Evaluation: Semantic-Aware Image Inpainting for Mixed Scenes | Liang Liao; Jing Xiao; Zheng Wang; Chia-Wen Lin; Shin’ichi Satoh; | In this paper, we propose a Semantic Guidance and Evaluation Network (SGE-Net) to iteratively update the structural priors and the inpainted image in an interplay framework of semantics extraction and image inpainting. |
1234 | Sound2Sight: Generating Visual Dynamics from Sound and Context | Moitreya Chatterjee; Anoop Cherian; | In this paper, we study this problem in the context of audio-conditioned visual synthesis — a task that is important, for example, in occlusion reasoning. |
1235 | 3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-View Spatial Feature Fusion for 3D Object Detection | Jin Hyeok Yoo; Yecheol Kim; Jisong Kim; Jun Won Choi; | In this paper, we propose a new deep architecture for fusing camera and LiDAR sensors for 3D object detection. |
1236 | NoiseRank: Unsupervised Label Noise Reduction with Dependence Models | Karishma Sharma; Pinar Donmez; Enming Luo; Yan Liu; I. Zeki Yalniz; | In this paper, we propose NoiseRank, for unsupervised label noise reduction using Markov Random Fields (MRF). |
1237 | Fast Adaptation to Super-Resolution Networks via Meta-Learning | Seobin Park; Jinsu Yoo; Donghyeon Cho; Jiwon Kim; Tae Hyun Kim; | In this work, we observe the opportunity for further improvement of the performance of SISR without changing the architecture of conventional SR networks by practically exploiting additional information given from the input image. |
1238 | TP-LSD: Tri-Points Based Line Segment Detector | Siyu Huang; Fangbo Qin; Pengfei Xiong; Ning Ding; Yijia He; Xiao Liu; | This paper proposes a novel deep convolutional model, Tri-Points Based Line Segment Detector (TP-LSD), to detect line segments in an image at real-time speed. |
1239 | SqueezeSegV3: Spatially-Adaptive Convolution for Efficient Point-Cloud Segmentation | Chenfeng Xu; Bichen Wu; Zining Wang; Wei Zhan; Peter Vajda; Kurt Keutzer; Masayoshi Tomizuka; | To fix this, we propose Spatially-Adaptive Convolution (SAC) to adopt different filters for different locations according to the input image. |
1240 | An Attention-driven Two-stage Clustering Method for Unsupervised Person Re-Identification | Zilong Ji; Xiaolong Zou; Xiaohan Lin; Xiao Liu; Tiejun Huang; Si Wu; | In the present study, we propose an attention-driven two-stage clustering (ADTC) method to solve this problem. |
1241 | Toward Fine-grained Facial Expression Manipulation | Jun Ling; Han Xue; Li Song; Shuhui Yang; Rong Xie; Xiao Gu; | In this study, we take these two objectives into consideration and propose a novel method. |
1242 | Adaptive Object Detection with Dual Multi-Label Prediction | Zhen Zhao; Yuhong Guo; Haifeng Shen; Jieping Ye; | In this paper, we propose a novel end-to-end unsupervised deep domain adaptation model for adaptive object detection by exploiting multi-label object recognition as a dual auxiliary task. |
1243 | Table Structure Recognition using Top-Down and Bottom-Up Cues | Sachin Raja; Ajoy Mondal; C V Jawahar; | In our work, we focus on tables that have complex structures, dense content, and varying layouts with no dependency on meta-features and/or OCR. |
1244 | Novel View Synthesis on Unpaired Data by Conditional Deformable Variational Auto-Encoder | Mingyu Yin; Li Sun; Qingli Li; | This paper proposes a view translation model within cVAE-GAN framework for the purpose of unpaired training. |
1245 | Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments | Jacob Krantz; Erik Wijmans; Arjun Majumdar; Dhruv Batra; Stefan Lee; | We develop a language-guided navigation task set in a continuous 3D environment where agents must execute low-level actions to follow natural language navigation directions. |
1246 | Boundary Content Graph Neural Network for Temporal Action Proposal Generation | Yueran Bai; Yingying Wang; Yunhai Tong; Yang Yang; Qiyue Liu; Junhui Liu; | To address this issue, we propose a novel Boundary Content Graph Neural Network (BC-GNN) to model the insightful relations between the boundary and action content of temporal proposals by the graph neural networks. |
1247 | Pose Augmentation: Class-agnostic Object Pose Transformation for Object Recognition | Yunhao Ge; Jiaping Zhao; Laurent Itti; | Here, we propose a different approach: a class-agnostic object pose transformation network (OPT-Net) can transform an image along 3D yaw and pitch axes to synthesize additional poses continuously. |
1248 | VLANet: Video-Language Alignment Network for Weakly-Supervised Video Moment Retrieval | Minuk Ma; Sunjae Yoon; Junyeong Kim; Youngjoon Lee; Sunghun Kang; Chang D. Yoo; | This paper explores a method for performing VMR in a weakly-supervised manner (wVMR): training is performed without temporal moment labels but only with the text query that describes a segment of the video. |
1249 | Attention-Based Query Expansion Learning | Albert Gordo; Filip Radenovic; Tamara Berg; | In this paper we propose a more principled framework to query expansion,where one trains, in a discriminative manner, a model that learns how images should be aggregated to form the expanded query. |
1250 | Interpretable Foreground Object Search As Knowledge Distillation | Boren Li; Po-Yu Zhuang; Jian Gu; Mingyang Li; Ping Tan; | This paper proposes a knowledge distillation method for foreground object search (FoS). |
1251 | Improving Knowledge Distillation via Category Structure | Zailiang Chen; Xianxian Zheng; Hailan Shen; Ziyang Zeng; Yukun Zhou; Rongchang Zhao; | In this paper, a novel Category Structure is proposed to transfer category-level structured relations for knowledge distillation. |
1252 | High Resolution Zero-Shot Domain Adaptation of Synthetically Rendered Face Images | Stephan J. Garbin; Marek Kowalski; Matthew Johnson; Jamie Shotton; | In this work, we propose an algorithm that matches a non-photorealistic, synthetically generated image to a latent vector of a pretrained StyleGAN2 model which, in turn, maps the vector to a photorealistic image of a person of the same pose, expression, hair, and lighting. |
1253 | Attentive Prototype Few-shot Learning with Capsule Network-based Embedding | Fangyu Wu; Jeremy S.Smith; Wenjin Lu; Chaoyi Pang; Bailing Zhang; | Our contributions include (1) a new embedding structure to encode relative spatial relationships between features by applying a capsule network (2) a new triplet loss designated to enhance the semantic feature embedding where similar samples are close to each other while dissimilar samples are farther apart and (3) an effective non-parametric classifier termed attentive prototypes in place of the simple prototypes in current few-shot learning. |
1254 | Weakly Supervised Instance Segmentation by Learning Annotation Consistent Instances | Aditya Arun; C.V. Jawahar; M. Pawan Kumar; | Unlike previous approaches, we explicitly model the uncertainty in the pseudo label generation process using a conditional distribution. |
1255 | DA4AD: End-to-End Deep Attention-based Visual Localization for Autonomous Driving | Yao Zhou; Guowei Wan; Shenhua Hou; Li Yu; Gang Wang; Xiaofei Rui; Shiyu Song; | We present a visual localization framework based on novel deep attention aware features for autonomous driving that achieves centimeter level localization accuracy. |
1256 | Visual-Relation Conscious Image Generation from Structured-Text | Duc Minh Vo; Akihiro Sugimoto; | We propose an end-to-end network for image generation from given structured-text that consists of the visual-relation layout module and the pyramid of GANs, namely stacking-GANs. |
1257 | Patch-wise Attack for Fooling Deep Neural Network | Lianli Gao; Qilong Zhang; Jingkuan Song; Xianglong Liu; Heng Tao Shen; | Motivated by this, we propose a patch-wise iterative algorithm – a black-box attack towards main stream normally trained and defense models, which differs from the existing attack methods manipulating pixel-wise noise. |
1258 | Feature Pyramid Transformer | Dong Zhang; Hanwang Zhang; Jinhui Tang; Meng Wang; Xiansheng Hua; Qianru Sun; | To this end, we propose a fully active feature interaction across both space and scales, called Feature Pyramid Transformer (FPT). |
1259 | MABNet: A Lightweight Stereo Network Based on Multibranch Adjustable Bottleneck Module | Jiabin Xing; Zhi Qi; Jiying Dong; Jiaxuan Cai; Hao Liu; | To address the issue, we propose two compact stereo networks, MABNet and its light version MABNet_tiny. |
1260 | Guided Saliency Feature Learning for Person Re-identification in Crowded Scenes | Lingxiao He; Wu Liu; | In this paper, we propose a simple occlusion-aware approach to address the problem. |
1261 | Asymmetric Two-Stream Architecture for Accurate RGB-D Saliency Detection | Miao Zhang; Sun Xiao Fei; Jie Liu; Shuang Xu; Yongri Piao; Huchuan Lu; | In this paper, we propose an asymmetric two-stream architecture taking account of the inherent differences between RGB and depth data for saliency detection. |
1262 | Explaining Image Classifiers using Statistical Fault Localization | Youcheng Sun; Hana Chockler; Xiaowei Huang; Daniel Kroening; | In this paper, we show that statistical fault localization (SFL) techniques from software engineering deliver high quality explanations of the outputs of DNNs, where we define an explanation as a minimal subset of features sufficient for making the same decision as for the original input. |
1263 | Deep Graph Matching via Blackbox Differentiation of Combinatorial Solvers | Michal Rolínek; Paul Swoboda; Dominik Zietlow; Anselm Paulus; Vít Musil; Georg Martius; | Building on recent progress at the intersection of combinatorial optimization and deep learning, we propose an end-to-end trainable architecture for deep graph matching that contains unmodified combinatorial solvers. |
1264 | Learning Video Representations by Transforming Time | Simon Jenni; Givi Meishvili; Paolo Favaro; | We introduce a novel self-supervised learning approach to learn representations of videos that are responsive to changes in the motion dynamics. |
1265 | Unsupervised Monocular Depth Estimation for Night-time Images using Adversarial Domain Feature Adaptation | Madhu Vankadari; Sourav Garg; Anima Majumder; Swagat Kumar; Ardhendu Behera; | In this paper, we look into the problem of estimating per-pixel depth maps from unconstrained RGB monocular night-time images which is a difficult task that has not been addressed adequately in the literature. |
1266 | Variational Connectionist Temporal Classification | Linlin Chao; Jingdong Chen; Wei Chu; | To remedy this, we propose variational CTC (Var-CTC) to enhance the learning of non-blank symbols. |
1267 | End-to-end Dynamic Matching Network for Multi-view Multi-person 3d Pose Estimation | Congzhentao Huang; Shuai Jiang; Yang Li; Ziyue Zhang; Jason Traish; Chen Deng; Sam Ferguson; Richard Yi Da Xu; | To address this phenomenon, we propose a novel end-to-end training scheme that brings the three separate modules into a single model. |
1268 | Orderly Disorder in Point Cloud Domain | Morteza Ghahremani; Bernard Tiddeman; Yonghuai Liu; and Ardhendu Behera; | In this paper, we propose a smart yet simple deep network for analysis of 3D modelsusing ‘orderly disorder’ theory. |
1269 | Deep Decomposition Learning for Inverse Imaging Problems | Dongdong Chen; Mike E. Davies; | In this paper, inspired by the geometry that data can be decomposed by two components from the null-space of the forward operator and the range space of its pseudo-inverse, we train neural networks to learn the two components and therefore learn the decomposition, i.e. we explicitly reformulate the neural network layers as learning range-nullspace decomposition functions with reference to the layer inputs, instead of learning unreferenced functions. |
1270 | FLOT: Scene Flow on Point Clouds guided by Optimal Transport | Gilles Puy; Alexandre Boulch; Renaud Marlet; | We propose and study a method called FLOT that estimates scene flow on point clouds. |
1271 | Accurate Reconstruction of Oriented 3D Points using Affine Correspondences | Carolina Raposo; Joao P. Barreto; | This article provides new formulations for achieving epipolar geometry-consistent ACs, that, besides leading to linear solvers that are up to 30$ imes$ faster than the state-of-the-art alternatives, allow for a fast refinement scheme that significantly improves the quality of the noisy ACs. |
1272 | Volumetric Transformer Networks | Seungryong Kim; Sabine Ssstrunk; Mathieu Salzmann; | To overcome this limitation, we introduce a learnable module, the volumetric transformer network (VTN), that predicts channel-wise warping fields so as to reconfigure intermediate CNN features spatially and channel-wisely. |
1273 | 360(o) Camera Alignment via Segmentation | Benjamin Davidson; Mohsan S. Alvi; João F. Henriques; | In this work, we investigate how to solve this problem by fusing purely geometric cues, such as apparent vanishing points, with learned semantic cues, such as the expectation that some visual elements (e.g. doors) have a natural upright position. |
1274 | A Novel Line Integral Transform for 2D Affine-Invariant Shape Retrieval | Bin Wang; Yongsheng Gao; | Although its extended version, trace transform, allow us to construct affine invariants, they are less informative and computational expensive due to the loss of spatial relationship between trace lines and the extensive repeated calculation of transform. To address this issue, a novel line integral transform is proposed. |
1275 | Explanation-based Weakly-supervised Learning of Visual Relations with Graph Networks | Federico Baldassarre; Kevin Smith; Josephine Sullivan; Hossein Azizpour; | This paper introduces a novel weakly-supervised method for visual relationship detection that relies on minimal image-level predicate labels. |
1276 | Guided Semantic Flow | Sangryul Jeon; Dongbo Min; Seungryong Kim; Jihwan Choe; Kwanghoon Sohn; | To address such severe matching ambiguities, we introduce a novel approach, called {guided semantic flow}, based on the key insight that sparse yet reliable matches can effectively capture non-rigid geometric variations, and these confident matches can guide adjacent pixels to have similar solution spaces, reducing the matching ambiguities significantly. |
1277 | Document Structure Extraction using Prior based High Resolution Hierarchical Semantic Segmentation | Mausoom Sarkar; Milan Aggarwal; Arneh Jain; Hiresh Gupta; Balaji Krishnamurthy; | In this paper, we share our findings on employing a hierarchical semantic segmentation network for this task of structure extraction. |
1278 | Measuring the Importance of Temporal Features in Video Saliency | Matthias Tangemann; Matthias Kümmerer; Thomas S.A. Wallis; Matthias Bethge; | In this work, we test this assumption by quantifying to which extent gaze on recent video saliency benchmarks can be predicted by a static baseline model. |
1279 | Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution | Haotian Tang; Zhijian Liu; Shengyu Zhao; Yujun Lin; Ji Lin; Hanrui Wang; Song Han; | To this end, we propose Sparse Point-Voxel Convolution (SPVConv), a lightweight 3D module that equips the vanilla Sparse Convolution with the high-resolution point-based branch. |
1280 | Towards Reliable Evaluation of Algorithms for Road Network Reconstruction from Aerial Images | Leonardo Citraro; Mateusz Kozi?ski; Pascal Fua; | To provide more reliable evaluation, we design three new metrics that are sensitive to all classes of errors. |
1281 | Online Continual Learning under Extreme Memory Constraints | Enrico Fini; Stéphane Lathuilière; Enver Sangineto; Moin Nabi; Elisa Ricci; | In this paper, we introduce the novel problem of Memory-Constrained Online Continual Learning (MC-OCL) which imposes strict constraints on the memory overhead that a possible algorithm can use to avoid catastrophic forgetting. |
1282 | Learning to Cluster under Domain Shift | Willi Menapace; Stéphane Lathuilière; Elisa Ricci; | In this work we overcome this assumption and we address the problem of transferring knowledge from a source to a target domain when both source and target data have no annotations. |
1283 | Defense Against Adversarial Attacks via Controlling Gradient Leaking on Embedded Manifolds | Yueru Li; Shuyu Cheng; Hang Su; Jun Zhu; | In this paper, we present a new perspective, namely gradient leaking hypothesis, to understand the existence of adversarial examples and to further motivate effective defense strategies. |
1284 | Improving Optical Flow on a Pyramid Level | Markus Hofinger; Samuel Rota Bulò Lorenzo Porzi; Arno Knapitsch; Thomas Pock; Peter Kontschieder; | In this work we review the coarse-to-fine spatial feature pyramid concept, which is used in state-of-the-art optical flow estimation networks to make exploration of the pixel flow search space computationally tractable and efficient. |
1285 | Procrustean Regression Networks: Learning 3D Structure of Non-Rigid Objects from 2D Annotations | Sungheon Park; Minsik Lee; Nojun Kwak; | We propose a novel framework for training neural networks which is capable of learning 3D information of non-rigid objects when only 2D annotations are available as ground truths. |
1286 | Learning to Learn Parameterized Classification Networks for Scalable Input Images | Duo Li; Anbang Yao; Qifeng Chen; | To achieve efficient and flexible image classification at runtime, we employ meta learners to generate convolutional weights of main networks for various input scales and maintain privatized Batch Normalization layers per scale. |
1287 | Stereo Event-based Particle Tracking Velocimetry for 3D Fluid Flow Reconstruction | Yuanhao Wang; Ramzi Idoughi; Wolfgang Heidrich; | In this paper, we present a new framework that retrieves dense 3D measurements of the fluid velocity field using a pair of event-based cameras. |
1288 | Simplicial Complex based Point Correspondence between Images warped onto Manifolds | Charu Sharma; Manohar Kaul; | In this paper, we pose the assignment problem as finding a bijective map between two graph induced simplicial complexes, which are higher-order analogues of graphs. |
1289 | Representation Learning on Visual-Symbolic Graphs for Video Understanding | Effrosyni Mavroudi; Benjamín Béjar Haro; René Vidal; | To capture this rich visual and semantic context, we propose using two graphs: (1) an attributed spatio-temporal visual graph whose nodes correspond to actors and objects and whose edges encode different types of interactions, and (2) a symbolic graph that models semantic relationships. |
1290 | Distance-Normalized Unified Representation for Monocular 3D Object Detection | Xuepeng Shi; Zhixiang Chen; Tae-Kyun Kim; | To achieve fast and accurate monocular 3D object detection, we introduce a single-stage and multi-scale framework to learn a unified representation for objects within different distance ranges, termed as UR3D. |
1291 | Sequential Deformation for Accurate Scene Text Detection | Shanyu Xiao; Liangrui Peng; Ruijie Yan; Keyu An; Gang Yao; Jaesik Min; | In this paper, we propose a novel sequential deformation method to effectively model the line-shape of scene text. |
1292 | Where to Explore Next? ExHistCNN for History-aware Autonomous 3D Exploration | Yiming Wang; Alessio Del Bue; | In this work we address the problem of autonomous 3D exploration of an unknown indoor environment using a depth camera. |
1293 | Semi-Supervised Segmentation based on Error-Correcting Supervision | Robert Mendel; Luis Antonio de Souza Jr; David Rauber; João Paulo Papa; Christoph Palm; | In this work, we augment such supervised segmentation models by allowing them to learn from unlabeled data. |
1294 | Quantum-soft QUBO Suppression for Accurate Object Detection | Junde Li; Swaroop Ghosh; | In this paper, we first map the task of removing redundant detections into Quadratic Unconstrained Binary Optimization (QUBO) framework that consists of detection score from each bounding box and overlap ratio between pair of bounding boxes. Next, we solve the QUBO problem using the proposed Quantum-soft QUBO Suppression algorithm for fast and accurate detection by exploiting quantum computing advantages. |
1295 | Label-similarity Curriculum Learning | Ürün Dogan; Aniket Anand Deshmukh; Marcin Bronislaw Machura; Christian Igel; | We propose a novel curriculum learning approach for image classification that adapts the loss function by changing the label representation. |
1296 | Recurrent Image Annotation With Explicit Inter-Label Dependencies | Ayushi Dutta; Yashaswi Verma; C.V. Jawahar; | In this paper, we address this limitation and propose a novel approach in which the RNN is explicitly forced to learn multiple relevant inter-label dependencies, without the need of feeding the ground-truth in any particular order. |
1297 | Cross-Attention in Coupled Unmixing Nets for Unsupervised Hyperspectral Super-Resolution | Jing Yao; Danfeng Hong; Jocelyn Chanussot; Deyu Meng; Xiaoxiang Zhu ; Zongben Xu; | To this end, we propose a novel coupled unmixing network with a cross-attention mechanism, CUCaNet for short, to enhance the spatial resolution of HSI by means of higher-spatial-resolution multispectral image (MSI). |
1298 | SimPose: Effectively Learning DensePose and Surface Normals of People from Simulated Data | Tyler Zhu; Per Karlsson; Christoph Bregler; | With a proliferation of generic domain-adaptation approaches, we report a simple yet effective technique for learning difficult per-pixel 2.5D and 3D regression representations of articulated people. |
1299 | ByeGlassesGAN: Identity Preserving Eyeglasses Removal for Face Images | Yu-Hui Lee; Shang-Hong Lai; | In this paper, we propose a novel image-to-image GAN framework for eyeglasses removal, called ByeGlassesGAN, which is used to automatically detect the position of eyeglasses and then remove them from face images. |
1300 | Differentiable Joint Pruning and Quantization for Hardware Efficiency | Ying Wang; Yadong Lu; Tijmen Blankevoort; | We present a differentiable joint pruning and quantization (DJPQ) scheme. |
1301 | Learning to Generate Customized Dynamic 3D Facial Expressions | Rolandos Alexandros Potamias; Jiali Zheng; Stylianos Ploumpis; Giorgos Bouritsas; Evangelos Ververas; Stefanos Zafeiriou; | In this paper, we extrapolate those advances to the 3D domain, by studying 3D image-to-video translation with a particular focus on 4D facial expressions. |
1302 | LandscapeAR: Large Scale Outdoor Augmented Reality by Matching Photographs with Terrain Models Using Learned Descriptors | Jan Brejcha; Michal Luká?; Yannick Hold-Geoffroy; Oliver Wang; Martin ?adík; | We introduce a solution to large scale Augmented Reality for outdoor scenes by registering camera images to textured Digital Elevation Models (DEMs). |
1303 | Learning Disentangled Feature Representation for Hybrid-distorted Image Restoration | Xin Li; Xin Jin; Jianxin Lin; Sen Liu; Yaojun Wu; Tao Yu; Wei Zhou ; Zhibo Chen; | To decompose such interference, we introduce the concept of Disentangled Feature Learning to achieve the feature-level divide-and-conquer of hybrid distortions. |
1304 | Jointly De-biasing Face Recognition and Demographic Attribute Estimation | Sixue Gong; Xiaoming Liu; Anil K. Jain; | We present a novel de-biasing adversarial network (DebFace) that learns to extract disentangled feature representations for both unbiased face recognition and demographics estimation. |
1305 | Regularized Loss for Weakly Supervised Single Class Semantic Segmentation | Olga Veksler; | We propose a new weakly supervised method for training CNNs to segment an object of a single class of interest. |
1306 | Spike-FlowNet: Event-based Optical Flow Estimation with Energy-Efficient Hybrid Neural Networks | Chankyu Lee; Adarsh Kumar Kosta; Alex Zihao Zhu; Kenneth Chaney; Kostas Daniilidis; Kaushik Roy; | To overcome these issues, we present Spike-FlowNet, a deep hybrid neural network architecture integrating SNNs and ANNs for efficiently estimating optical flow from sparse event camera outputs without sacrificing the performance. |
1307 | Forgetting Outside the Box: Scrubbing Deep Networks of Information Accessible from Input-Output Observations | Aditya Golatkar; Alessandro Achille; Stefano Soatto; | We describe a procedure for removing dependency on a cohort of training data from a trained deep network that improves upon and generalizes previous methods to different readout functions, and can be extended to ensure forgetting in the final activations of the network. |
1308 | Inherent Adversarial Robustness of Deep Spiking Neural Networks: Effects of Discrete Input Encoding and Non-Linear Activations | Saima Sharmin; Nitin Rathi; Priyadarshini Panda; Kaushik Roy; | In this work, we demonstrate that adversarial accuracy of SNNs under gradient-based attacks is higher than their non-spiking counterparts for CIFAR datasets on deep VGG and ResNet architectures, particularly in blackbox attack scenario. |
1309 | Synthesizing Coupled 3D Face Modalities by Trunk-Branch Generative Adversarial Networks | Baris Gecer; Alexandros Lattas; Stylianos Ploumpis; Jiankang Deng; Athanasios Papaioannou; Stylianos Moschoglou; Stefanos Zafeiriou; | In this paper, we present the first methodology that generates high-quality texture, shape, and normals jointly, which can be used for photo-realistic synthesis. |
1310 | Learning to Learn Words from Visual Scenes | Dídac Surís; Dave Epstein; Heng Ji; Shih-Fu Chang; Carl Vondrick; | We introduce a meta-learning framework that mph{learns how to learn} word representations from unconstrained scenes. |
1311 | On Transferability of Histological Tissue Labels in Computational Pathology | Mahdi S. Hosseini; Lyndon Chan; Weimin Huang; Yichen Wang; Danial Hasan; Corwyn Rowsell; Savvas Damaskinos; Konstantinos N. Plataniotis; | In this paper, we explore the possibility of transferring diagnostically-relevant histology labels from a source-domain into multiple target-domains to classify similar tissue structures and cancer grades. |
1312 | Learning Actionness via Long-range Temporal Order Verification | Dimitri Zhukov; Jean-Baptiste Alayrac; Ivan Laptev; Josef Sivic; | To address these challenges, we here propose a self-supervised and generic method to isolate actions from their back-ground. |
1313 | Fully Embedding Fast Convolutional Networks on Pixel Processor Arrays | Laurie Bose; Piotr Dudek; Jianing Chen; Stephen J. Carey; Walterio W. Mayol-Cuevas; | We present a novel method of CNN inference for pixel processor array (PPA) vision sensors, designed to take advantage of their massive parallelism and analog compute capabilities. |
1314 | Character Region Attention For Text Spotting | Youngmin Baek; Seung Shin; Jeonghun Baek; Sungrae Park; Junyeop Lee ; Daehyun Nam; Hwalsuk Lee; | Based on the insight, we construct a tightly coupled single pipeline model. |
1315 | Stable Low-rank Tensor Decomposition for Compression of Convolutional Neural Network | Anh-Huy Phan; Konstantin Sobolev; Konstantin Sozykin; Dmitry Ermilov ; Julia Gusak; Petr Tichavský Valeriy Glukhov; Ivan Oseledets; Andrzej Cichocki; | We present a novel method, which can stabilize the low-rank approximation of convolutional kernels and ensure efficient compression while preserving the high-quality performance of the neural networks. |
1316 | Dual Mixup Regularized Learning for Adversarial Domain Adaptation | Yuan Wu; Diana Inkpen; Ahmed El-Roby; | In order to alleviate the above issues, we propose a dual mixup regularized learning (DMRL) method for UDA, which not only guides the classifier in enhancing consistent predictions in-between samples, but also enriches the intrinsic structures of the latent space. |
1317 | Robust and On-the-fly Dataset Denoising for Image Classification | Jiaming Song; Yann Dauphin; Michael Auli; Tengyu Ma; | We address this problem by reasoning counterfactually about the loss distribution of examples with uniform random labels had they were trained with the real examples, and use this information to remove noisy examples from the training set. |
1318 | Imaging Behind Occluders Using Two-Bounce Light | Connor Henley; Tomohiro Maeda; Tristan Swedish; Ramesh Raskar; | We introduce the new non-line-of-sight imaging problem of mph{imaging behind an occluder}. |
1319 | Improving Object Detection with Selective Self-Supervised Self-Training | Yandong Li; Di Huang; Danfeng Qin; Liqiang Wang; Boqing Gong; | To tackle this challenge, we propose a selective net to rectify the supervision signals in Web images. |
1320 | Deep Local Shapes: Learning Local SDF Priors for Detailed 3D Reconstruction | Rohan Chabra; Jan E. Lenssen; Eddy Ilg; Tanner Schmidt; Julian Straub; Steven Lovegrove; Richard Newcombe; | To address this problem we introduce Deep Local Shapes (DeepLS), a deep shape representation that enables high-quality 3D shape representation without prohibitive memory requirements. |
1321 | Info3D: Representation Learning on 3D Objects using Mutual Information Maximization and Contrastive Learning | Aditya Sanghi; | To solve these issues we propose to extend the InfoMax and contrastive learning principles on 3D shapes. |
1322 | Adversarial Data Augmentation via Deformation Statistics | Sahin Olut; Zhengyang Shen; Zhenlin Xu; Samuel Gerber; Marc Niethammer; | To that end, we explore an augmentation strategy which builds statistical deformation models from unlabeled data via principal component analysis and uses the resulting statistical deformation space to augment the labeled training samples. |
1323 | Neural Predictor for Neural Architecture Search | Wei Wen; Hanxiao Liu; Yiran Chen; Hai Li; Gabriel Bender; Pieter-Jan Kindermans; | We propose an approach with three basic steps that is conceptually much simpler. |
1324 | Learning Permutation Invariant Representations using Memory Networks | Shivam Kalra; Mohammed Adnan; Graham Taylor; H.R. Tizhoosh; | In this work, we present a permutation invariant neural network called Memory-based Exchangeable Model (MEM) for learning universal set functions. |
1325 | Feature Space Augmentation for Long-Tailed Data | Peng Chu; Xiao Bian; Shaopeng Liu; Haibin Ling; | In this work, we present a novel approach to address the long-tailed problem by augmenting the under-represented classes in the feature space with the features learned from the classes with ample samples. |
1326 | Laying the Foundations of Deep Long-Term Crowd Flow Prediction | Samuel S. Sohn; Honglu Zhou; Seonghyeon Moon; Sejong Yoon; Vladimir Pavlovic; Mubbasir Kapadia; | We propose the first deep framework to instantly predict the long-term flow of crowds in arbitrarily large, realistic environments. |
1327 | Weakly-Supervised Action Localization with Expectation-Maximization Multi-Instance Learning | Zhekun Luo; Devin Guillory; Baifeng Shi; Wei Ke; Fang Wan; Trevor Darrell; Huijuan Xu; | In this work, we explicitly model the key instances assignment as a hidden variable and adopt an Expectation-Maximization (EM) framework. |
1328 | Fairness by Learning Orthogonal Disentangled Representations | Mhd Hasan Sarhan; Nassir Navab; Abouzar Eslami; Shadi Albarqouni; | In this paper, we propose a novel disentanglement approach to invariant representation problem. |
1329 | Self-supervision with Superpixels: Training Few-shot Medical Image Segmentation without Annotation | Cheng Ouyang; Carlo Biffi; Chen Chen; Turkay Kart; Huaqi Qiu; Daniel Rueckert; | To address this problem we make several contributions: (1) A novel self-supervised FSS framework for medical images in order to eliminate the requirement for annotations during training. |
1330 | On Diverse Asynchronous Activity Anticipation | He Zhao; Richard P. Wildes; | We investigate the joint anticipation of long-term activity labels and their corresponding times with the aim of improving both the naturalness and diversity of predictions. We address these matters using Conditional Adversarial Generative Networks for Discrete Sequences. |
1331 | Representative-Discriminative Learning for Open-set Land Cover Classification of Satellite Imagery | Razieh Kaviani Baghbaderani; Ying Qu; Hairong Qi; Craig Stutts; | In this paper, we study the problem of open-set land cover classification that identifies the samples belonging to unknown classes during testing, while maintaining performance on known classes. |
1332 | Structure-Aware Human-Action Generation | Ping Yu; Yang Zhao; Chunyuan Li; Junsong Yuan; Changyou Chen; | To overcome this challenge, we propose a variant of GCNs to leverage the self-attention mechanism to prune a complete action graph in the temporal space. |
1333 | Towards Efficient Coarse-to-Fine Networks for Action and Gesture Recognition | Niamul Quader; Juwei Lu; Peng Dai; Wei Li; | First, we systematically yield enhanced receptive fields for complementary feature extraction via coarse-to-fine decomposition of input imagery along the spatial and temporal dimensions, and adaptively focus on training important feature pathways using a reparameterized fully connected layer. Second, we develop a `use when needed’ scheme with a `coarse-exit’ strategy that allows selective use of expensive high-resolution processing in a data-dependent fashion to retain accuracy while reducing computation cost. |
1334 | S³Net: Semantic-Aware Self-supervised Depth Estimation with Monocular Videos and Synthetic Data | Bin Cheng; Inderjot Singh Saggu; Raunak Shah; Gaurav Bansal; Dinesh Bharadia; | We present S3Net, a self-supervised framework which combines these complementary features: we use synthetic and real-world images for training while exploiting geometric, temporal, as well as semantic constraints. |
1335 | Leveraging Seen and Unseen Semantic Relationships for Generative Zero-Shot Learning | Maunil R Vyas; Hemanth Venkateswara; Sethuraman Panchanathan; | To address this concern, we propose the novel LsrGAN, a generative model that Leverages the Semantic Relationship between seen and unseen categories and explicitly performs knowledge transfer by incorporating a novel Semantic Regularized Loss (SR-Loss). |
1336 | Weight Excitation: Built-in Attention Mechanisms in Convolutional Neural Networks | Niamul Quader; Md Mafijul Islam Bhuiyan; Juwei Lu; Peng Dai; Wei Li; | We propose novel approaches for simultaneously identifying important weights of a convolutional neural network (ConvNet) and providing more attention to the important weights during training. |
1337 | UNITER: UNiversal Image-TExt Representation Learning | Yen-Chun Chen; Linjie Li; Licheng Yu; Ahmed El Kholy Faisal Ahmed; Zhe Gan; Yu Cheng; Jingjing Liu; | In this paper, we introduce UNITER, a UNiversal Image-TExt Representation, learned through large-scale pre-training over four image-text datasets (COCO, Visual Genome, Conceptual Captions, and SBU Captions), which can power heterogeneous downstream V+L tasks with joint multimodal embeddings. |
1338 | Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks | Xiujun Li; Xi Yin; Chunyuan Li; Pengchuan Zhang; Xiaowei Hu; Lei Zhang; Lijuan Wang; Houdong Hu; Li Dong; Furu Wei; Yejin Choi; Jianfeng Gao; | While existing methods simply concatenate image region features and text features as input to the model to be pre-trained and use self-attention to learn image-text semantic alignments in a brute force manner, in this paper, we propose a new learning method Oscar, which uses object tags detected in images as anchor points to significantly ease the learning of alignments. |
1339 | Improving Face Recognition from Hard Samples via Distribution Distillation Loss | Yuge Huang; Pengcheng Shen; Ying Tai; Shaoxin Li; Xiaoming Liu; Jilin Li; Feiyue Huang; Rongrong Ji; | To improve the performance on hard samples, we propose a novel Distribution Distillation Loss to narrow the performance gap between easy and hard samples, which is simple, effective and generic for various types of facial variations. |
1340 | Extract and Merge: Superpixel Segmentation with Regional Attributes | Jianqiao An; Yucheng Shi; Yahong Han; Meijun Sun; Qi Tian; | In this work, we propose the concept of regional attribute, which indicates the location of a certain region in the object. |
1341 | Spatial-Adaptive Network for Single Image Denoising | Meng Chang; Qi Li; Huajun Feng; Zhihai Xu; | In this paper, we propose a novel spatial-adaptive denoising network (SADNet) for effcient single image blind noise removal. |
1342 | Physics-based Feature Dehazing Networks | Jiangxin Dong; Jinshan Pan; | We propose a physics-based feature dehazing network for image dehazing. |
1343 | Learning Surrogates via Deep Embedding | Yash Patel; Tomáš Hoda?; Ji?í Matas; | This paper proposes a technique for training neural networks by minimizing surrogate losses that approximate the target evaluation metric, which may be non-differentiable. |
1344 | An Asymmetric Modeling for Action Assessment | Jibin Gao; Wei-Shi Zheng; Jia-Hui Pan; Chengying Gao; Yaowei Wang; Wei Zeng; Jianhuang Lai; | In this work, we model the asymmetric interactions among agents for action assessment. |
1345 | High-quality Single-model Deep Video Compression with Frame-Conv3D and Multi-frame Differential Modulation | Wenyu Sun; Chen Tang; Weigui Li; Zhuqing Yuan; Huazhong Yang; Yongpan Liu; | This paper proposes a deep video compression method to simultaneously encode multiple frames with Frame-Conv3D and differential modulation. |
1346 | Instance-Aware Embedding for Point Cloud Instance Segmentation | Tong He; Yifan Liu; Chunhua Shen; Xinlong Wang; Changming Sun; | In this work, we study the influence of instance-aware knowledge by proposing an Instance-Aware Module (IAM). |
1347 | Self-Paced Deep Regression Forests with Consideration on Underrepresented Examples | Lili Pan; Shijie Ai; Yazhou Ren; Zenglin Xu; | To this end, this paper proposes a new deep discriminative model—self-paceddeep regression forests with consideration on underrepresented examples (SPUDRFs). |
1348 | Manifold Projection for Adversarial Defense on Face Recognition | Jianli Zhou; Chao Liang; Jun Chen; | In this paper, we propose Adversarial Variational AutoEncoder (A-VAE), a novel framework to tackle both types of attacks. |
1349 | Weakly Supervised Learning with Side Information for Noisy Labeled Images | Lele Cheng; Xiangzeng Zhou; Liming Zhao; Dangwei Li; Hong Shang; Yun Zheng; Pan Pan; Yinghui Xu; | In this paper, we present an efficient weakly-supervised learning by using a Side Information Network (SINet), which aims to effectively carry out a large scale classi cation with severely noisy labels. |
1350 | Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision | Peng Wu; Jing Liu; Yujia Shi; Yujia Sun; Fangtao Shao; Zhaoyang Wu ; Zhiwei Yang; | To address this problem, in this work we first release a large-scale and multi-scene dataset named XD-Violence with a total duration of 217 hours, containing 4754 untrimmed videos with audio signals and weak labels. |
1351 | SNE-RoadSeg: Incorporating Surface Normal Information into Semantic Segmentation for Accurate Freespace Detection | Rui Fan; Hengli Wang; Peide Cai; Ming Liu; | Hence, in this paper, we first introduce a novel module, named surface normal estimator (SNE), which can infer surface normal information from dense depth/disparity images with high accuracy and efficiency. Furthermore, we propose a data-fusion CNN architecture, referred to as RoadSeg, which can extract and fuse features from both RGB images and the inferred surface normal information for accurate freespace detection. |
1352 | Modeling the Space of Point Landmark Constrained Diffeomorphisms | Chengfeng Wen; Yang Guo; Xianfeng Gu; | In order to fulfill these requirements, this work proposes a novel model of the space of point landmark constrained diffeomorphisms. |
1353 | PieNet: Personalized Image Enhancement Network | Han-Ul Kim; Young Jun Koh; Chang-Su Kim; | In this paper, we propose the first deep learning approach to personalized image enhancement, which can enhance new images for a new user, by asking him or her to select about 10$\sim$20 preferred images from a random set of images. |
1354 | Rotational Outlier Identification in Pose Graphs Using Dual Decomposition | Arman Karimian; Ziqi Yang; Roberto Tron; | In this paper, we contribute to the state of the art of the latter, by proposing a method to detect incorrect orientation measurements prior to pose graph optimization by checking the geometric consistency of rotation measurements. |
1355 | Speech-driven Facial Animation using Cascaded GANs for Learning of Motion and Texture | Dipanjan Das; Sandika Biswas; Sanjana Sinha; Brojeshwar Bhowmick; | In this paper, we propose a novel strategy where we partition the problem and learn the motion and texture separately. |
1356 | Solving Phase Retrieval with a Learned Reference | Rakib Hyder; Zikui Cai; M. Salman Asif; | In this paper, we assume that a known (learned) reference is added to the signal before capturing the Fourier amplitude measurements. Our method is inspired by the principle of adding a reference signal in holography. |
1357 | Dual Grid Net: Hand Mesh Vertex Regression from Single Depth Maps | Chengde Wan; Thomas Probst; Luc Van Gool; Angela Yao; | We aim to recover the dense 3D surface of the hand from depth maps and propose a network that can predict mesh vertices, transformation matrices for every joint and joint coordinates in a single forward pass. |