Paper Digest: ICCV 2015 Highlights
The International Conference on Computer Vision (ICCV) is one of the top computer vision conferences in the world. In 2015, it is to be held in Santiago, Chile
To help AI community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.
We thank all authors for writing these interesting papers, and readers for reading our digests. If you do not want to miss any interesting AI paper, you are welcome to sign up our free paper digest service to get new paper updates customized to your own interests on a daily basis.
Paper Digest Team
team@paperdigest.org
TABLE 1: ICCV 2015 Papers
Title | Authors | Highlight | |
---|---|---|---|
1 | Ask Your Neurons: A Neural-Based Approach to Answering Questions About Images | Mateusz Malinowski, Marcus Rohrbach, Mario Fritz | By combining latest advances in image representation and natural language processing, we propose Neural-Image-QA, an end-to-end formulation to this problem for which all parts are trained jointly. |
2 | Segment-Phrase Table for Semantic Segmentation, Visual Entailment and Paraphrasing | Hamid Izadinia, Fereshteh Sadeghi, Santosh K. Divvala, Hannaneh Hajishirzi, Yejin Choi, Ali Farhadi | We introduce Segment-Phrase Table (SPT), a large collection of bijective associations between textual phrases and their corresponding segmentations. |
3 | Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books | Yukun Zhu, Ryan Kiros, Rich Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, Sanja Fidler | To align movies and books we propose a neural sentence embedding that is trained in an unsupervised way from a large corpus of books, as well as a video-text neural embedding for computing similarities between movie clips and sentences in the book. |
4 | Learning Query and Image Similarities With Ranking Canonical Correlation Analysis | Ting Yao, Tao Mei, Chong-Wah Ngo | We demonstrate in this paper that the above two limitations can be well mitigated by jointly exploring subspace learning and the use of click-through data. |
5 | Learning to See by Moving | Pulkit Agrawal, Joao Carreira, Jitendra Malik | Drawing inspiration from this observation, in this work we investigated if the awareness of egomotion(i.e. self motion) can be used as a supervisory signal for feature learning. |
6 | Object Detection Using Generalization and Efficiency Balanced Co-Occurrence Features | Haoyu Ren, Ze-Nian Li | In this paper, we propose a high-accuracy object detector based on co-occurrence features. |
7 | Mining And-Or Graphs for Graph Matching and Object Discovery | Quanshi Zhang, Ying Nian Wu, Song-Chun Zhu | Given a set of attributed relational graphs (ARGs), we propose to use a hierarchical And-Or Graph (AoG) to model the pattern of maximal-size common subgraphs embedded in the ARGs, and we develop a general method to mine the AoG model from the unlabeled ARGs. |
8 | Pose Induction for Novel Object Categories | Shubham Tulsiani, Joao Carreira, Jitendra Malik | We present a generalized classifier that can reliably induce pose given a single instance of a novel category. |
9 | Dynamic Texture Recognition via Orthogonal Tensor Dictionary Learning | Yuhui Quan, Yan Huang, Hui Ji | To overcome these obstacles, we proposed a structured tensor dictionary learning method for sparse coding, which learns a dictionary structured with orthogonality and separability. |
10 | Convolutional Channel Features | Bin Yang, Junjie Yan, Zhen Lei, Stan Z. Li | In this paper, we revisit two widely used approaches in computer vision, namely filtered channel features and Convolutional Neural Networks (CNN), and absorb merits from both by proposing an integrated method called Convolutional Channel Features (CCF). |
11 | Local Convolutional Features With Unsupervised Training for Image Retrieval | Mattis Paulin, Matthijs Douze, Zaid Harchaoui, Julien Mairal, Florent Perronin, Cordelia Schmid | We present a comparison framework to benchmark current deep convolutional approaches along with Patch-CKN for both patch and image retrieval, including our novel “RomePatches” dataset. |
12 | RIDE: Reversal Invariant Descriptor Enhancement | Lingxi Xie, Jingdong Wang, Weiyao Lin, Bo Zhang, Qi Tian | In this paper, we present RIDE (Reversal Invariant Descriptor Enhancement) for fine-grained object recognition. |
13 | Discrete Tabu Search for Graph Matching | Kamil Adamczewski, Yumin Suh, Kyoung Mu Lee | In this paper, we propose a novel graph matching algorithm based on tabu search. |
14 | Discriminative Learning of Deep Convolutional Feature Point Descriptors | Edgar Simo-Serra, Eduard Trulls, Luis Ferraz, Iasonas Kokkinos, Pascal Fua, Francesc Moreno-Noguer | In this paper we use Convolutional Neural Networks (CNNs) to learn discriminant patch representations and in particular train a Siamese network with pairs of (non-)corresponding patches. |
15 | Amodal Completion and Size Constancy in Natural Scenes | Abhishek Kar, Shubham Tulsiani, Joao Carreira, Jitendra Malik | Here we propose to tackle these issues by building upon advances in object recognition and using recently created large-scale datasets. |
16 | Learning Where to Position Parts in 3D | Marco Pedersoli, Tinne Tuytelaars | In this paper we propose a new method for the detection and pose estimation of 3D objects, that does not use any 3D CAD model or other 3D information. |
17 | Query Adaptive Similarity Measure for RGB-D Object Recognition | Yanhua Cheng, Rui Cai, Chi Zhang, Zhiwei Li, Xin Zhao, Kaiqi Huang, Yong Rui | This paper studies the problem of improving the top-1 accuracy of RGB-D object recognition. |
18 | Listening With Your Eyes: Towards a Practical Visual Speech Recognition System Using Deep Boltzmann Machines | Chao Sui, Mohammed Bennamoun, Roberto Togneri | This paper presents a novel feature learning method for visual speech recognition using Deep Boltzmann Machines (DBM). |
19 | Cluster-Based Point Set Saliency | Flora Ponjou Tasse, Jiri Kosinka, Neil Dodgson | We propose a cluster-based approach to point set saliency detection, a challenge since point sets lack topological information. |
20 | A Comprehensive Multi-Illuminant Dataset for Benchmarking of the Intrinsic Image Algorithms | Shida Beigpour, Andreas Kolb, Sven Kunz | In this paper, we provide a new, real photo dataset with precise ground-truth for intrinsic image research. We provide full per-pixel intrinsic ground-truth data for these scenarios, i.e. reflectance, specularity, shading, and illumination for scenes as well as preliminary depth information. |
21 | PatchMatch-Based Automatic Lattice Detection for Near-Regular Textures | Siying Liu, Tian-Tsong Ng, Kalyan Sunkavalli, Minh N. Do, Eli Shechtman, Nathan Carr | In this work, we investigate the problem of automatically inferring the lattice structure of near-regular textures (NRT) in real-world images. |
22 | A Data-Driven Metric for Comprehensive Evaluation of Saliency Models | Jia Li, Changqun Xia, Yafei Song, Shu Fang, Xiaowu Chen | To address this problem, we propose a data-driven metric for comprehensive evaluation of saliency models. |
23 | A Matrix Decomposition Perspective to Multiple Graph Matching | Junchi Yan, Hongteng Xu, Hongyuan Zha, Xiaokang Yang, Huanxi Liu, Stephen Chu | Our method aims to extract the common inliers and their synchronized permutations from disordered weighted graphs in the presence of deformation and outliers. |
24 | Fast and Effective L0 Gradient Minimization by Region Fusion | Rang M. H. Nguyen, Michael S. Brown | In this paper, we present a new method to perform L_0 gradient minimization that is fast and effective. |
25 | Generic Promotion of Diffusion-Based Salient Object Detection | Peng Jiang, Nuno Vasconcelos, Jingliang Peng | In this work, we propose a generic scheme to promote any diffusion-based salient object detection algorithm by original ways to re-synthesize the diffusion matrix and construct the seed vector. |
26 | Nighttime Haze Removal With Glow and Multiple Light Colors | Yu Li, Robby T. Tan, Michael S. Brown | To address these effects, we introduce a new nighttime haze model that accounts for the varying light sources and their glow. |
27 | Conformal and Low-Rank Sparse Representation for Image Restoration | Jianwei Li, Xiaowu Chen, Dongqing Zou, Bo Gao, Wei Teng | In this paper, we propose a novel sparse representation approach called conformal and low-rank sparse representation (CLRSR) for image restoration problems. |
28 | Patch Group Based Nonlocal Self-Similarity Prior Learning for Image Denoising | Jun Xu, Lei Zhang, Wangmeng Zuo, David Zhang, Xiangchu Feng | In this paper, we propose a patch group (PG) based NSS prior learning scheme to learn explicit NSS models from natural images for high performance denoising. |
29 | Automatic Thumbnail Generation Based on Visual Representativeness and Foreground Recognizability | Jingwei Huang, Huarong Chen, Bin Wang, Stephen Lin | We present an automatic thumbnail generation technique based on two essential considerations: how well they visually represent the original photograph, and how well the foreground can be recognized after the cropping and downsizing steps of thumbnailing. We propose a set of image features for modeling these two considerations of thumbnails, and learn how to balance their relative effects on thumbnail generation through training on image pairs composed of photographs and their corresponding thumbnails created by an expert photographer. |
30 | SALICON: Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks | Xun Huang, Chengyao Shen, Xavier Boix, Qi Zhao | This paper presents a focused study to narrow the semantic gap with an architecture based on Deep Neural Network (DNN). |
31 | A Novel Sparsity Measure for Tensor Recovery | Qian Zhao, Deyu Meng, Xu Kong, Qi Xie, Wenfei Cao, Yao Wang, Zongben Xu | In this paper, we propose a new sparsity regularizer for measuring the low-rank structure underneath a tensor. |
32 | Oriented Object Proposals | Shengfeng He, Rynson W.H. Lau | In this paper, we propose a new approach to generate oriented object proposals (OOPs) to reduce the detection error caused by various orientations of the object. |
33 | Learning Nonlinear Spectral Filters for Color Image Reconstruction | Michael Moeller, Julia Diebold, Guy Gilboa, Daniel Cremers | This paper presents the idea of learning optimal filters for color image reconstruction based on a novel concept of nonlinear spectral image decompositions recently proposed by Guy Gilboa. |
34 | Beyond White: Ground Truth Colors for Color Constancy Correction | Dongliang Cheng, Brian Price, Scott Cohen, Michael S. Brown | In this paper, we describe how to overcome this limitation. |
35 | RGB-Guided Hyperspectral Image Upsampling | Hyeokhyen Kwon, Yu-Wing Tai | In this paper, we present an algorithm to enhance and upsample the resolution of hyperspectral images. |
36 | Projection Onto the Manifold of Elongated Structures for Accurate Extraction | Amos Sironi, Vincent Lepetit, Pascal Fua | We solve this problem by projecting patches of the score map to their nearest neighbors in a set of ground truth training patches. |
37 | Naive Bayes Super-Resolution Forest | Jordi Salvador, Eduardo Perez-Pellitero | This paper presents a fast, high-performance method for super resolution with external learning. |
38 | POP Image Fusion – Derivative Domain Image Fusion Without Reintegration | Graham D. Finlayson, Alex E. Hayes | In this paper we avoid these hallucinated details by avoiding the reintegration step. |
39 | Adaptive Spatial-Spectral Dictionary Learning for Hyperspectral Image Denoising | Ying Fu, Antony Lam, Imari Sato, Yoichi Sato | In this paper, we propose an effective model for hyperspectral image (HSI) denoising that considers underlying characteristics of HSIs: sparsity across the spatial-spectral domain, high correlation across spectra, and non-local self-similarity over space. |
40 | Fully Connected Guided Image Filtering | Longquan Dai, Mengke Yuan, Feihu Zhang, Xiaopeng Zhang | This paper presents a linear time fully connected guided filter by introducing the minimum spanning tree (MST) to the guided filter (GF). |
41 | Segment Graph Based Image Filtering: Fast Structure-Preserving Smoothing | Feihu Zhang, Longquan Dai, Shiming Xiang, Xiaopeng Zhang | In our SGF, we use the tree distance on the segment graph to define the internal weight function of the filtering kernel, which enables the filter to smooth out high-contrast details and textures while preserving major image structures very well. |
42 | Deep Networks for Image Super-Resolution With Sparse Prior | Zhaowen Wang, Ding Liu, Jianchao Yang, Wei Han, Thomas Huang | In this paper, we argue that domain expertise represented by the conventional sparse coding model is still valuable, and it can be combined with the key ingredients of deep learning to achieve further improved results. |
43 | Convolutional Color Constancy | Jonathan T. Barron | In contrast, in this paper we reformulate the problem of color constancy as a 2D spatial localization task in a log-chrominance space, thereby allowing us to apply techniques from object detection and structured prediction to the color constancy problem. |
44 | Learning Ordinal Relationships for Mid-Level Vision | Daniel Zoran, Phillip Isola, Dilip Krishnan, William T. Freeman | We propose a framework that infers mid-level visual properties of an image by learning about ordinal relation- ships. |
45 | Thin Structure Estimation With Curvature Regularization | Dmitrii Marin, Yuchen Zhong, Maria Drangova, Yuri Boykov | We propose an objective function combining detection likelihoods with a prior minimizing curvature of the center-lines or surfaces. |
46 | HARF: Hierarchy-Associated Rich Features for Salient Object Detection | Wenbin Zou, Nikos Komodakis | To address such an issue, this paper proposes a novel hierarchy-associated feature construction framework for salient object detection, which is based on integrating elementary features from multi-level regions in a hierarchy. |
47 | Deep Colorization | Zezhou Cheng, Qingxiong Yang, Bin Sheng | Unlike the previous methods, this paper aims at a high-quality fully-automatic colorization method. |
48 | Image Matting With KL-Divergence Based Sparse Sampling | Levent Karacan, Aykut Erdem, Erkut Erdem | To alleviate this, in this paper we take an entirely new approach and formulate sampling as a sparse subset selection problem where we propose to pick a small set of candidate samples that best explains the unknown pixels. |
49 | Intrinsic Decomposition of Image Sequences From Local Temporal Variations | Pierre-Yves Laffont, Jean-Charles Bazin | We present a method for intrinsic image decomposition, which aims to decompose images into reflectance and shading layers. |
50 | Low-Rank Tensor Approximation With Laplacian Scale Mixture Modeling for Multiframe Image Denoising | Weisheng Dong, Guangyu Li, Guangming Shi, Xin Li, Yi Ma | In this work, we propose a novel low-rank tensor approximation framework with Laplacian Scale Mixture (LSM) modeling for multi-frame image denoising. |
51 | Learning Parametric Distributions for Image Super-Resolution: Where Patch Matching Meets Sparse Coding | Yongbo Li, Weisheng Dong, Guangming Shi, Xuemei Xie | In this paper, we propose to develop a hybrid approach toward SR by combining those two lines of ideas. |
52 | Improving Image Restoration With Soft-Rounding | Xing Mei, Honggang Qi, Bao-Gang Hu, Siwei Lyu | In this work, we describe an effective and efficient approach to incorporate the knowledge of distinct pixel values of the pristine images into the general regularized least squares restoration framework. |
53 | See the Difference: Direct Pre-Image Reconstruction and Pose Estimation by Differentiating HOG | Wei-Chen Chiu, Mario Fritz | We present our implementation of [?] |
54 | An Efficient Statistical Method for Image Noise Level Estimation | Guangyong Chen, Fengyuan Zhu, Pheng Ann Heng | In this paper, we address the problem of estimating noise level from a single image contaminated by additive zero-mean Gaussian noise. |
55 | Contour Detection and Characterization for Asynchronous Event Sensors | Francisco Barranco, Ching L. Teo, Cornelia Fermuller, Yiannis Aloimonos | This paper presents an approach to learn the location of contours and their border ownership using Structured Random Forests on event-based features that encode motion, timing, texture, and spatial orientations. |
56 | Class-Specific Image Deblurring | Saeed Anwar, Cong Phuoc Huynh, Fatih Porikli | In this paper, we explore the potential of a class-specific image prior for recovering spatial frequencies attenuated by the blurring process. |
57 | High-for-Low and Low-for-High: Efficient Boundary Detection From Deep Object Features and its Applications to High-Level Vision | Gedas Bertasius, Jianbo Shi, Lorenzo Torresani | Inspired by this observation, in this work we show how to predict boundaries by exploiting object-level features from a pretrained object-classification network. |
58 | Variational Depth Superresolution Using Example-Based Edge Representations | David Ferstl, Matthias Ruther, Horst Bischof | In this paper we propose a novel method for depth image superresolution which combines recent advances in example based upsampling with variational superresolution based on a known blur kernel. |
59 | Conditioned Regression Models for Non-Blind Single Image Super-Resolution | Gernot Riegler, Samuel Schulter, Matthias Ruther, Horst Bischof | In this paper, we loosen this restrictive constraint and propose conditioned regression models (including convolutional neural networks and random forests) that can effectively exploit the additional kernel information during both, training and inference. |
60 | Video Super-Resolution via Deep Draft-Ensemble Learning | Renjie Liao, Xin Tao, Ruiyu Li, Ziyang Ma, Jiaya Jia | We propose a new direction for fast video super-resolution (VideoSR) via a SR draft ensemble, which is defined as the set of high-resolution patch candidates before final image deconvolution. |
61 | Pan-Sharpening With a Hyper-Laplacian Penalty | Yiyong Jiang, Xinghao Ding, Delu Zeng, Yue Huang, John Paisley | We present a method for pan-sharpening in which a sparsity-promoting objective function preserves both spatial and spectral content, and is efficient to optimize. |
62 | Video Restoration Against Yin-Yang Phasing | Xiaolin Wu, Zhenhao Li, Xiaowei Deng | In this paper, we investigate the problem and propose a video restoration technique to suppress YYP artifacts and retain temporal consistency of objects appearance via inter-frame, spatially-adaptive, optimal tone mapping. |
63 | Rolling Shutter Super-Resolution | Abhijith Punnappurath, Vijay Rengarajan, A.N. Rajagopalan | In this paper, we study the hitherto unexplored topic of multi-image SR in CMOS cameras. |
64 | Learning Large-Scale Automatic Image Colorization | Aditya Deshpande, Jason Rock, David Forsyth | We describe an automated method for image colorization that learns to colorize from examples. |
65 | Compression Artifacts Reduction by a Deep Convolutional Network | Chao Dong, Yubin Deng, Chen Change Loy, Xiaoou Tang | Existing algorithms either focus on removing blocking artifacts and produce blurred output, or restores sharpened images that are accompanied with ringing effects. |
66 | Multiple-Hypothesis Affine Region Estimation With Anisotropic LoG Filters | Takahiro Hasegawa, Mitsuru Ambai, Kohta Ishikawa, Gou Koutaki, Yuji Yamauchi, Takayoshi Yamashita, Hironobu Fujiyoshi | We propose a method for estimating multiple-hypothesis affine regions from a keypoint by using an anisotropic Laplacian-of-Gaussian (LoG) filter. |
67 | A Self-Paced Multiple-Instance Learning Framework for Co-Saliency Detection | Dingwen Zhang, Deyu Meng, Chao Li, Lu Jiang, Qian Zhao, Junwei Han | To alleviate this problem, we propose a novel framework for this task, by naturally reformulating it as a multiple-instance learning (MIL) problem and further integrating it into a self-paced learning (SPL) regime. |
68 | External Patch Prior Guided Internal Clustering for Image Denoising | Fei Chen, Lei Zhang, Huimin Yu | In this paper, we propose to exploit image external patch prior and internal self-similarity prior jointly, and develop an external patch prior guided internal clustering algorithm for image denoising. |
69 | Self-Calibration of Optical Lenses | Michael Hirsch, Bernhard Scholkopf | We propose a method that enables the self-calibration of lenses from a natural image, or a set of such images. |
70 | Illumination Robust Color Naming via Label Propagation | Yuanliu liu, Zejian Yuan, Badong Chen, Jianru Xue, Nanning Zheng | In this paper we address the problem of inferring the color composition of the intrinsic reflectance of objects, where the shadows and highlights may change the observed color dramatically. For evaluation we collect three datasets of images under noticeable highlights and shadows. |
71 | Unsupervised Cross-Modal Synthesis of Subject-Specific Scans | Raviteja Vemulapalli, Hien Van Nguyen, Shaohua Kevin Zhou | Hence, to address this issue, we propose a general unsupervised cross-modal medical image synthesis approach that works without paired training data. |
72 | Learning to Boost Filamentary Structure Segmentation | Lin Gu, Li Cheng | Focusing on this issue, this paper proposes an iterative two-step learning-based approach to boost the performance based on a base segmenter arbitrarily chosen from a number of existing segmenters: We start with an initial partial segmentation where the filamentary structure obtained is of high confidence based on this existing segmenter. |
73 | Weakly-Supervised Structured Output Learning With Flexible and Latent Graphs Using High-Order Loss Functions | Gustavo Carneiro, Tingying Peng, Christine Bayer, Nassir Navab | We introduce two new structured output models that use a latent graph, which is flexible in terms of the number of nodes and structure, where the training process minimises a high-order loss function using a weakly annotated training set. |
74 | Efficient Classifier Training to Minimize False Merges in Electron Microscopy Segmentation | Toufiq Parag, Dan C. Ciresan, Alessandro Giusti | This study proposes a novel classifier training algorithm for EM segmentation aimed to reduce the amount of manual effort demanded by the groundtruth annotation and error refinement tasks. |
75 | On Statistical Analysis of Neuroimages With Imperfect Registration | Won Hwa Kim, Sathya N. Ravi, Sterling C. Johnson, Ozioma C. Okonkwo, Vikas Singh | In this paper, we derive a novel algorithm which offers immunity to local errors in the underlying deformation field obtained from registration procedures. We present a set of results on synthetic and real brain images where we achieve robust statistical analysis even in the presence of substantial deformation errors; here, standard analysis procedures significantly under-perform and fail to identify the true signal. |
76 | Convex Optimization With Abstract Linear Operators | Steven Diamond, Stephen Boyd | We introduce a convex optimization modeling framework that transforms a convex optimization problem expressed in a form natural and convenient for the user into an equivalent cone program in a way that preserves fast linear transforms in the original problem. |
77 | Building Dynamic Cloud Maps From the Ground Up | Calvin Murdock, Nathan Jacobs, Robert Pless | We demonstrate how this imagery can be constructed “from the ground up” without requiring expensive geo-stationary satellites. |
78 | A Versatile Learning-Based 3D Temporal Tracker: Scalable, Robust, Online | David Joseph Tan, Federico Tombari, Slobodan Ilic, Nassir Navab | This paper proposes a temporal tracking algorithm based on Random Forest that uses depth images to estimate and track the 3D pose of a rigid object in real-time. |
79 | Realtime Edge-Based Visual Odometry for a Monocular Camera | Juan Jose Tarrio, Sol Pedre | In this work we present a novel algorithm for realtime visual odometry for a monocular camera. |
80 | Fill and Transfer: A Simple Physics-Based Approach for Containability Reasoning | Lap-Fai Yu, Noah Duncan, Sai-Kit Yeung | In this paper we introduce a novel approach to reason about liquid containability – the affordance of containing liquid. |
81 | On Linear Structure From Motion for Light Field Cameras | Ole Johannsen, Antonin Sulc, Bastian Goldluecke | We present a novel approach to relative pose estimation which is tailored to 4D light field cameras. |
82 | 3D Object Reconstruction From Hand-Object Interactions | Dimitrios Tzionas, Juergen Gall | In this work, we show that extracting 3d hand motion for in-hand scanning effectively facilitates the reconstruction of even featureless and highly symmetric objects and we present an approach that fuses the rich additional information of hands into a 3d reconstruction pipeline, significantly contributing to the state-of-the-art of in-hand scanning. |
83 | Minimal Solvers for 3D Geometry From Satellite Imagery | Enliang Zheng, Ke Wang, Enrique Dunn, Jan-Michael Frahm | We propose two novel minimal solvers which advance the state of the art in satellite imagery processing. |
84 | An Efficient Minimal Solution for Multi-Camera Motion | Jonathan Ventura, Clemens Arth, Vincent Lepetit | We propose an efficient method for estimating the motion of a multi-camera rig from a minimal set of feature correspondences. |
85 | Learning Shape, Motion and Elastic Models in Force Space | Antonio Agudo, Francesc Moreno-Noguer | In this paper, we address the problem of simultaneously recovering the 3D shape and pose of a deformable and potentially elastic object from 2D motion. |
86 | A Versatile Scene Model With Differentiable Visibility Applied to Generative Pose Estimation | Helge Rhodin, Nadia Robertini, Christian Richardt, Hans-Peter Seidel, Christian Theobalt | We present a new scene representation that enables an analytically differentiable closed-form formulation of surface visibility. |
87 | Semantic Pose Using Deep Networks Trained on Synthetic RGB-D | Jeremie Papon, Markus Schoeler | In this work we address the problem of indoor scene understanding from RGB-D images. |
88 | Exploiting High Level Scene Cues in Stereo Reconstruction | Simon Hadfield, Richard Bowden | We present a novel approach to 3D reconstruction which is inspired by the human visual system. |
89 | Point Triangulation Through Polyhedron Collapse Using the l[?] Norm | Simon Donne, Bart Goossens, Wilfried Philips | In this paper we present a novel method for L-infinity triangulation that minimizes the L-infinity norm of the L-infinity reprojection errors: this apparently small difference leads to a much faster but equally accurate solution which is related to the MLE under the assumption of uniform noise. |
90 | Optimizing the Viewing Graph for Structure-From-Motion | Chris Sweeney, Torsten Sattler, Tobias Hollerer, Matthew Turk, Marc Pollefeys | In this paper, we take a fundamentally different approach to SfM and instead focus on improving the quality of the viewing graph before applying SfM. |
91 | Intrinsic Scene Decomposition From RGB-D images | Mohammed Hachama, Bernard Ghanem, Peter Wonka | In this paper, we address the problem of computing an intrinsic decomposition of the colors of a surface into an albedo and a shading term. |
92 | 3D Hand Pose Estimation Using Randomized Decision Forest With Segmentation Index Points | Peiyi Li, Haibin Ling, Xi Li, Chunyuan Liao | In this paper, we propose a real-time 3D hand pose estimation algorithm using the randomized decision forest framework. |
93 | Accurate Camera Calibration Robust to Defocus Using a Smartphone | Hyowon Ha, Yunsu Bok, Kyungdon Joo, Jiyoung Jung, In So Kweon | We propose a novel camera calibration method for defocused images using a smartphone under the assumption that the defocus blur is modeled as a convolution of a sharp image with a Gaussian point spread function (PSF). |
94 | High Quality Structure From Small Motion for Rolling Shutter Cameras | Sunghoon Im, Hyowon Ha, Gyeongmin Choe, Hae-Gon Jeon, Kyungdon Joo, In So Kweon | We present a practical 3D reconstruction method to obtain a high-quality dense depth map from narrow-baseline image sequences captured by commercial digital cameras, such as DSLRs or mobile phones. |
95 | Photogeometric Scene Flow for High-Detail Dynamic 3D Reconstruction | Paulo F. U. Gotardo, Tomas Simon, Yaser Sheikh, Iain Matthews | This paper proposes photogeometric scene flow (PGSF) for high-quality dynamic 3D reconstruction. |
96 | Blur-Aware Disparity Estimation From Defocus Stereo Images | Ching-Hui Chen, Hui Zhou, Timo Ahonen | We propose a blur-aware disparity estimation method that is robust to the mismatch of focus in stereo images. |
97 | Global Structure-From-Motion by Similarity Averaging | Zhaopeng Cui, Ping Tan | We propose to compute a sparse depth image at each camera to solve both problems. |
98 | Massively Parallel Multiview Stereopsis by Surface Normal Diffusion | Silvano Galliani, Katrin Lasinger, Konrad Schindler | We present a new, massively parallel method for high-quality multiview matching. |
99 | Variational PatchMatch MultiView Reconstruction and Refinement | Philipp Heise, Brian Jensen, Sebastian Klose, Alois Knoll | In this work we propose a novel approach to the problem of multi-view stereo reconstruction. |
100 | As-Rigid-As-Possible Volumetric Shape-From-Template | Shaifali Parashar, Daniel Pizarro, Adrien Bartoli, Toby Collins | We present strategies to find an initial solution based on thin-shell SfT and volume propagation. |
101 | General Dynamic Scene Reconstruction From Multiple View Video | Armin Mustafa, Hansung Kim, Jean-Yves Guillemaut, Adrian Hilton | This paper introduces a general approach to dynamic scene reconstruction from multiple moving cameras without prior knowledge or limiting constraints on the scene structure, appearance, or illumination. |
102 | The Joint Image Handbook | Matthew Trager, Martial Hebert, Jean Ponce | We revisit in this paper the geometric and algebraic properties of the joint image, and address fundamental questions such as how many and which multilinearities are necessary and/or sufficient to determine camera geometry and/or image correspondences. |
103 | Direct, Dense, and Deformable: Template-Based Non-Rigid 3D Reconstruction From RGB Video | Rui Yu, Chris Russell, Neill D. F. Campbell, Lourdes Agapito | In this paper we tackle the problem of capturing the dense, detailed 3D geometry of generic, complex non-rigid meshes using a single RGB-only commodity video camera and a direct approach. |
104 | Single Image Pop-Up From Discriminatively Learned Parts | Menglong Zhu, Xiaowei Zhou, Kostas Daniilidis | We introduce a new approach for estimating a fine grained 3D shape and continuous pose of an object from a single image. |
105 | Learning Informative Edge Maps for Indoor Scene Layout Prediction | Arun Mallya, Svetlana Lazebnik | In this paper, we introduce new edge-based features for the task of recovering the 3D layout of an indoor scene from a single image. |
106 | Multi-View Convolutional Neural Networks for 3D Shape Recognition | Hang Su, Subhransu Maji, Evangelos Kalogerakis, Erik Learned-Miller | We address this question in the context of learning to recognize 3D shapes from a collection of their rendered views on 2D images. |
107 | Learning Analysis-by-Synthesis for 6D Pose Estimation in RGB-D Images | Alexander Krull, Eric Brachmann, Frank Michel, Michael Ying Yang, Stefan Gumhold, Carsten Rother | We propose an approach that “learns to compare”, while taking these difficulties into account. |
108 | 3D Surface Profilometry Using Phase Shifting of De Bruijn Pattern | Matea Donlic, Tomislav Petkovic, Tomislav Pribanic | A novel structured light method for color 3D surface profilometry is proposed. |
109 | A Deep Visual Correspondence Embedding Model for Stereo Matching Costs | Zhuoyuan Chen, Xun Sun, Liang Wang, Yinan Yu, Chang Huang | This paper presents a data-driven matching cost for stereo matching. |
110 | Learning Concept Embeddings With Combined Human-Machine Expertise | Michael Wilber, Iljung S. Kwak, David Kriegman, Serge Belongie | This paper presents our work on “SNaCK,” a low-dimensional concept embedding algorithm that combines human expertise with automatic machine similarity kernels. |
111 | Deep Multi-Patch Aggregation Network for Image Style, Aesthetics, and Quality Estimation | Xin Lu, Zhe Lin, Xiaohui Shen, Radomir Mech, James Z. Wang | We propose a deep multi-patch aggregation network training approach, which allows us to train models using multiple patches generated from one image. |
112 | Towards Computational Baby Learning: A Weakly-Supervised Approach for Object Detection | Xiaodan Liang, Si Liu, Yunchao Wei, Luoqi Liu, Liang Lin, Shuicheng Yan | Inspired by these observations, we propose a computational model for weakly-supervised object detection, based on prior knowledge modelling, exemplar learning and learning with video contexts. |
113 | Improving Image Classification With Location Context | Kevin Tang, Manohar Paluri, Li Fei-Fei, Rob Fergus, Lubomir Bourdev | In this work, we tackle the problem of performing image classification with location context, in which we are given the GPS coordinates for images in both the train and test phases. To evaluate our model and to help promote research in this area, we identify a set of location-sensitive concepts and annotate a subset of the Yahoo Flickr Creative Commons 100M dataset that has GPS coordinates with these concepts, which we make publicly available. |
114 | HICO: A Benchmark for Recognizing Human-Object Interactions in Images | Yu-Wei Chao, Zhan Wang, Yugeng He, Jiaxuan Wang, Jia Deng | We introduce a new benchmark “Humans Interacting with Common Objects” (HICO) for recognizing human-object interactions (HOI). |
115 | Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification | Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun | In this work, we study rectifier neural networks for image classification from two aspects. |
116 | Continuous Pose Estimation With a Spatial Ensemble of Fisher Regressors | Michele Fenzi, Laura Leal-Taixe, Jorn Ostermann, Tinne Tuytelaars | In this paper, we treat the problem of continuous pose estimation for object categories as a regression problem on the basis of only 2D training information. |
117 | Adaptive Hashing for Fast Similarity Search | Fatih Cakir, Stan Sclaroff | To overcome these challenges, we propose an online learning algorithm based on stochastic gradient descent in which the hash functions are updated iteratively with streaming data. |
118 | Single Image 3D Without a Single 3D Image | David F. Fouhey, Wajahat Hussain, Abhinav Gupta, Martial Hebert | In this paper, we show that one can learn a mapping from appearance to 3D properties without ever seeing a single explicit 3D label. |
119 | Cross-Domain Image Retrieval With a Dual Attribute-Aware Ranking Network | Junshi Huang, Rogerio S. Feris, Qiang Chen, Shuicheng Yan | We address the problem of cross-domain image retrieval, considering the following practical application: given a user photo depicting a clothing image, our goal is to retrieve the same or attribute-similar clothing items from online shopping stores. |
120 | Attribute-Graph: A Graph Based Approach to Image Ranking | Nikita Prabhu, R. Venkatesh Babu | We propose a novel image representation, termed Attribute-Graph, to rank images by their semantic similarity to a given query image. |
121 | Contextual Action Recognition With R*CNN | Georgia Gkioxari, Ross Girshick, Jitendra Malik | In this work, we exploit the simple observation that actions are accompanied by contextual cues to build a strong action recognition system. |
122 | What Makes an Object Memorable? | Rachit Dubey, Joshua Peterson, Aditya Khosla, Ming-Hsuan Yang, Bernard Ghanem | In this paper, we provide the first attempt to answer the question: what exactly is remembered about an image? |
123 | kNN Hashing With Factorized Neighborhood Representation | Kun Ding, Chunlei Huo, Bin Fan, Chunhong Pan | Based on the observation that retrieval precision is highly related to the kNN classification accuracy, this paper proposes a novel kNN-based supervised hashing method, which learns hash functions by directly maximizing the kNN accuracy of the Hamming-embedded training data. |
124 | Multi-View Complementary Hash Tables for Nearest Neighbor Search | Xianglong Liu, Lei Huang, Cheng Deng, Jiwen Lu, Bo Lang | For single multi-view table, using exemplar based feature fusion, we approximate the inherent data similarities with a low-rank matrix, and learn discriminative hash functions in an efficient way. |
125 | Scalable Person Re-Identification: A Benchmark | Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, Qi Tian | As a minor contribution, inspired by recent advances in large-scale image search, this paper proposes an unsupervised Bag-of-Words descriptor. |
126 | MMSS: Multi-Modal Sharable and Specific Feature Learning for RGB-D Object Recognition | Anran Wang, Jianfei Cai, Jiwen Lu, Tat-Jen Cham | Motivated by the intuition that different modalities should contain not only some modal-specific patterns but also some shared common patterns, we propose a multi-modal feature learning framework for RGB-D object recognition. |
127 | Object Detection via a Multi-Region and Semantic Segmentation-Aware CNN Model | Spyros Gidaris, Nikos Komodakis | We propose an object detection system that relies on a multi-region deep convolutional neural network (CNN) that also encodes semantic segmentation-aware features. |
128 | Neural Activation Constellations: Unsupervised Part Model Discovery With Convolutional Networks | Marcel Simon, Erik Rodner | We present an approach that is able to learn part models in a completely unsupervised manner, without part annotations and even without given bounding boxes during learning. |
129 | Cascaded Sparse Spatial Bins for Efficient and Effective Generic Object Detection | David Novotny, Jiri Matas | A novel efficient method for extraction of object proposals is introduced. |
130 | Probabilistic Label Relation Graphs With Ising Models | Nan Ding, Jia Deng, Kevin P. Murphy, Hartmut Neven | In this paper, we extend the HEX model to allow for soft or probabilistic relations between labels, which is useful when there is uncertainty about the relationship between two labels (e.g., an antelope is “sort of” furry, but not to the same degree as a grizzly bear). |
131 | Predicting Good Features for Image Geo-Localization Using Per-Bundle VLAD | Hyo Jin Kim, Enrique Dunn, Jan-Michael Frahm | We address the problem of recognizing a place depicted in a query image by using a large database of geo-tagged images at a city-scale. |
132 | Task-Driven Feature Pooling for Image Classification | Guo-Sen Xie, Xu-Yao Zhang, Xiangbo Shu, Shuicheng Yan, Cheng-Lin Liu | In this paper, we propose a novel task-driven pooling (TDP) model to directly learn the pooled representation from data in a discriminative manner. |
133 | Cutting Edge: Soft Correspondences in Multimodal Scene Parsing | Sarah Taghavi Namin, Mohammad Najafi, Mathieu Salzmann, Lars Petersson | In this paper, we address the problem of data misalignment and label inconsistencies, e.g., due to moving objects, in semantic labeling, which violate the assumption of existing techniques. |
134 | One Shot Learning via Compositions of Meaningful Patches | Alex Wong, Alan L. Yuille | We propose an unsupervised method for learning a compact dictionary of image patches representing meaningful components of an objects. |
135 | FASText: Efficient Unconstrained Scene Text Detector | Michal Busta, Lukas Neumann, Jiri Matas | We propose a novel easy-to-implement stroke detector based on an efficient pixel intensity comparison to surrounding pixels. |
136 | Multi-Scale Recognition With DAG-CNNs | Songfan Yang, Deva Ramanan | We present extensive analysis and demonstrate state-of-the-art classification performance on three standard scene benchmarks (SUN397, MIT67, and Scene15). |
137 | Relaxed Multiple-Instance SVM With Application to Object Discovery | Xinggang Wang, Zhuotun Zhu, Cong Yao, Xiang Bai | In this paper, we propose a novel method to solve the classical MIL problem, named relaxed multiple-instance SVM (RMI-SVM). |
138 | Im2Calories: Towards an Automated Mobile Vision Food Diary | Austin Meyers, Nick Johnston, Vivek Rathod, Anoop Korattikara, Alex Gorban, Nathan Silberman, Sergio Guadarrama, George Papandreou, Jonathan Huang, Kevin P. Murphy | We present CNN-based approaches to these problems, with promising preliminary results. |
139 | LEWIS: Latent Embeddings for Word Images and their Semantics | Albert Gordo, Jon Almazan, Naila Murray, Florent Perronin | The goal of this work is to bring semantics into the tasks of text recognition and retrieval in natural images. |
140 | Per-Sample Kernel Adaptation for Visual Recognition and Grouping | Borislav Antic, Bjorn Ommer | Our goal is, therefore, to adjust the contribution of individual feature dimensions when comparing any two samples and computing their similarity. |
141 | Fine-Grained Change Detection of Misaligned Scenes With Varied Illuminations | Wei Feng, Fei-Peng Tian, Qian Zhang, Nan Zhang, Liang Wan, Jizhou Sun | This paper proposes a feasible end-to-end approach to this challenging problem. We build three real-world datasets to benchmark fine-grained change detection of misaligned scenes under varied multiple lighting conditions. |
142 | Aggregating Local Deep Features for Image Retrieval | Artem Babenko, Victor Lempitsky | In this paper we investigate possible ways to aggregate local deep features to produce compact descriptors for image retrieval. |
143 | Learning Deep Object Detectors From 3D Models | Xingchao Peng, Baochen Sun, Karim Ali, Kate Saenko | In a detailed analysis, we use synthetic CAD images to probe the ability of DCNN to learn without these cues, with surprising findings. |
144 | Harvesting Discriminative Meta Objects With Deep CNN Features for Scene Classification | Ruobing Wu, Baoyuan Wang, Wenping Wang, Yizhou Yu | In this paper, we present a novel pipeline built upon deep CNN features to harvest discriminative visual objects and parts for scene classification. |
145 | Scalable Nonlinear Embeddings for Semantic Category-Based Image Retrieval | Gaurav Sharma, Bernt Schiele | We propose a novel algorithm for the task of supervised discriminative distance learning by nonlinearly embedding vectors into a low dimensional Euclidean space. |
146 | Person Re-Identification Ranking Optimisation by Discriminant Context Information Analysis | Jorge Garcia, Niki Martinel, Christian Micheloni, Alfredo Gardel | In this paper, we focus on such a problem and introduce an unsupervised ranking optimization approach based on discriminant context information analysis. |
147 | Unsupervised Generation of a Viewpoint Annotated Car Dataset From Videos | Nima Sedaghat, Thomas Brox | In this paper,we present an approach that creates a dataset of images annotated with bounding boxes and viewpoint labels in a fully automated manner from videos. |
148 | Structured Indoor Modeling | Satoshi Ikehata, Hang Yang, Yasutaka Furukawa | This paper presents a novel 3D modeling framework that reconstructs an indoor scene as a structured model from panorama RGBD images. |
149 | 3D Time-Lapse Reconstruction From Internet Photos | Ricardo Martin-Brualla, David Gallup, Steven M. Seitz | 3D Time-Lapse Reconstruction From Internet Photos |
150 | Global, Dense Multiscale Reconstruction for a Billion Points | Benjamin Ummenhofer, Thomas Brox | We present a variational approach for surface reconstruction from a set of oriented points with scale information. |
151 | On the Visibility of Point Clouds | Sagi Katz, Ayellet Tal | We show that three such properties are sufficient: the sign of the function, monotonicity, and a condition regarding the function’s parameter. |
152 | Weakly Supervised Graph Based Semantic Segmentation by Learning Communities of Image-Parts | Niloufar Pourian, S. Karthikeyan, B.S. Manjunath | We present a weakly-supervised approach to semantic segmentation. |
153 | Piecewise Flat Embedding for Image Segmentation | Yizhou Yu, Chaowei Fang, Zicheng Liao | In this paper, we propose a new nonlinear embedding, called piecewise flat embedding, for image segmentation. |
154 | Semantic Image Segmentation via Deep Parsing Network | Ziwei Liu, Xiaoxiao Li, Ping Luo, Chen-Change Loy, Xiaoou Tang | Specifically, DPN extends a contemporary CNN architecture to model unary terms and additional layers are carefully devised to approximate the mean field algorithm (MF) for pairwise terms. |
155 | Human Parsing With Contextualized Convolutional Neural Network | Xiaodan Liang, Chunyan Xu, Xiaohui Shen, Jianchao Yang, Si Liu, Jinhui Tang, Liang Lin, Shuicheng Yan | In this work, we address the human parsing task with a novel Contextualized Convolutional Neural Network (Co-CNN) architecture, which well integrates the cross-layer context, global image-level context, within-super-pixel context and cross-super-pixel neighborhood context into a unified network. |
156 | Holistically-Nested Edge Detection | Saining Xie, Zhuowen Tu | We develop a new edge detection algorithm that addresses two critical issues in this long-standing vision problem: (1) holistic image training; and (2) multi-scale feature learning. |
157 | Minimum Barrier Salient Object Detection at 80 FPS | Jianming Zhang, Stan Sclaroff, Zhe Lin, Xiaohui Shen, Brian Price, Radomir Mech | We propose a highly efficient, yet powerful, salient object detection method based on the Minimum Barrier Distance (MBD) Transform. |
158 | Learning Image Representations Tied to Ego-Motion | Dinesh Jayaraman, Kristen Grauman | We propose to exploit proprioceptive motor signals to provide unsupervised regularization in convolutional neural networks to learn visual representations from egocentric video. |
159 | Unsupervised Visual Representation Learning by Context Prediction | Carl Doersch, Abhinav Gupta, Alexei A. Efros | This work explores the use of spatial context as a source of free and plentiful supervisory signal for training a rich visual representation. |
160 | Webly Supervised Learning of Convolutional Networks | Xinlei Chen, Abhinav Gupta | We present an approach to utilize large amounts of web data for learning CNNs. |
161 | Fast R-CNN | Ross Girshick | This paper proposes a Fast Region-based Convolutional Network method (Fast R-CNN) for object detection. |
162 | Bilinear CNN Models for Fine-Grained Visual Recognition | Tsung-Yu Lin, Aruni RoyChowdhury, Subhransu Maji | We propose bilinear models, a recognition architecture that consists of two feature extractors whose outputs are multiplied using outer product at each location of the image and pooled to obtain an image descriptor. |
163 | Discovering the Spatial Extent of Relative Attributes | Fanyi Xiao, Yong Jae Lee | We present a weakly-supervised approach that discovers the spatial extent of relative attributes, given only pairs of ordered images. |
164 | Deep Neural Decision Forests | Peter Kontschieder, Madalina Fiterau, Antonio Criminisi, Samuel Rota Bulo | To combine these two worlds, we introduce a stochastic and differentiable decision tree model, which steers the representation learning usually conducted in the initial layers of a (deep) convolutional network. |
165 | Deep Fried Convnets | Zichao Yang, Marcin Moczulski, Misha Denil, Nando de Freitas, Alex Smola, Le Song, Ziyu Wang | In this paper we show how kernel methods, in particular a single Fastfood layer, can be used to replace the fully connected layers in a deep convolutional neural network. |
166 | Semantic Component Analysis | Calvin Murdock, Fernando De la Torre | To address the issues of previous work, we present a simple but effective method called Semantic Component Analysis (SCA), which provides a decomposition of images into semantic components. |
167 | Low-Rank Matrix Factorization Under General Mixture Noise Distributions | Xiangyong Cao, Yang Chen, Qian Zhao, Deyu Meng, Yao Wang, Dong Wang, Zongben Xu | To make LRMF capable of adapting more complex noise, this paper proposes a new LRMF model by assuming noise as Mixture of Exponential Power (MoEP) distributions and proposes a penalized MoEP model by combining the penalized likelihood method with MoEP distributions. |
168 | Web-Scale Image Clustering Revisited | Yannis Avrithis, Yannis Kalantidis, Evangelos Anagnostopoulos, Ioannis Z. Emiris | Combined with powerful deep learned representations, we achieve clustering of a 100 million image collection on a single machine in less than one hour. |
169 | Learning Discriminative Reconstructions for Unsupervised Outlier Removal | Yan Xia, Xudong Cao, Fang Wen, Gang Hua, Jian Sun | We study the problem of automatically removing outliers from noisy data, with application for removing outlier images from an image collection. |
170 | Learning Deconvolution Network for Semantic Segmentation | Hyeonwoo Noh, Seunghoon Hong, Bohyung Han | We propose a novel semantic segmentation algorithm by learning a deep deconvolution network. |
171 | Conditional Random Fields as Recurrent Neural Networks | Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-Paredes, Vibhav Vineet, Zhizhong Su, Dalong Du, Chang Huang, Philip H. S. Torr | To solve this problem, we introduce a new form of convolutional neural network that combines the strengths of Convolutional Neural Networks (CNNs) and Conditional Random Fields (CRFs)-based probabilistic graphical modelling. |
172 | The One Triangle Three Parallelograms Sampling Strategy and Its Application in Shape Regression | Mikael Nilsson | The purpose of this paper is threefold. |
173 | Boosting Object Proposals: From Pascal to COCO | Jordi Pont-Tuset, Luc Van Gool | In sight of these results, we propose various lines of research to take advantage of the new benchmark and improve the techniques. |
174 | Secrets of GrabCut and Kernel K-Means | Meng Tang, Ismail Ben Ayed, Dmitrii Marin, Yuri Boykov | We propose an alternative approach to color clustering using kernel K-means energy with well-known properties such as non-linear separation and scalability to higher-dimensional feature spaces. |
175 | Video Matting via Sparse and Low-Rank Representation | Dongqing Zou, Xiaowu Chen, Guangying Cao, Xiaogang Wang | We introduce a novel method of video matting via sparse and low-rank representation. |
176 | Joint Object and Part Segmentation Using Deep Learned Potentials | Peng Wang, Xiaohui Shen, Zhe Lin, Scott Cohen, Brian Price, Alan L. Yuille | In this paper, we propose a joint solution that tackles semantic object and part segmentation simultaneously, in which higher object-level context is provided to guide part segmentation, and more detailed part-level localization is utilized to refine object segmentation. |
177 | Low-Rank Tensor Constrained Multiview Subspace Clustering | Changqing Zhang, Huazhu Fu, Si Liu, Guangcan Liu, Xiaochun Cao | In this paper, we explore the problem of multiview subspace clustering. |
178 | BodyPrint: Pose Invariant 3D Shape Matching of Human Bodies | Jiangping Wang, Kai Ma, Vivek Kumar Singh, Thomas Huang, Terrence Chen | We address this problem by proposing a novel holistic human body shape descriptor called BodyPrint. |
179 | The Middle Child Problem: Revisiting Parametric Min-Cut and Seeds for Object Proposals | Ahmad Humayun, Fuxin Li, James M. Rehg | We propose a new energy minimization framework incorporating geodesic distances between segments which solves this problem. |
180 | Contour Guided Hierarchical Model for Shape Matching | Yuanqi Su, Yuehu Liu, Bonan Cuan, Nanning Zheng | In the paper, we present a novel algorithm that reconsiders these connections and reduces the global matching to a set of interrelated local matching. |
181 | Robust Image Segmentation Using Contour-Guided Color Palettes | Xiang Fu, Chien-Yi Wang, Chen Chen, Changhu Wang, C.-C. Jay Kuo | Robust Image Segmentation Using Contour-Guided Color Palettes |
182 | Joint Optimization of Segmentation and Color Clustering | Ekaterina Lobacheva, Olga Veksler, Yuri Boykov | We propose to make clustering an integral part of segmentation, by including a new clustering term in the energy function. |
183 | BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation | Jifeng Dai, Kaiming He, Jian Sun | In this paper, we propose a method that achieves competitive accuracy but only requires easily obtained bounding box annotations. |
184 | Detection and Segmentation of 2D Curved Reflection Symmetric Structures | Ching L. Teo, Cornelia Fermuller, Yiannis Aloimonos | In this work, we propose a complete approach that links the detection of curved reflection symmetries to produce symmetry-constrained segments of structures/regions in real images with clutter. |
185 | Unsupervised Tube Extraction Using Transductive Learning and Dense Trajectories | Mihai Marian Puscas, Enver Sangineto, Dubravko Culibrk, Nicu Sebe | Specifically, we propose to use Dense Trajectories in order to robustly match and track candidate boxes over different frames. |
186 | Compositional Hierarchical Representation of Shape Manifolds for Classification of Non-Manifold Shapes | Mete Ozay, Umit Rusen Aktas, Jeremy L. Wyatt, Ales Leonardis | To this end, we introduce a framework to implement the indexing mechanisms for the employment of the vocabulary for structural shape classification. |
187 | Shell PCA: Statistical Shape Modelling in Shell Space | Chao Zhang, Behrend Heeren, Martin Rumpf, William A. P. Smith | In this paper we describe how to perform Principal Components Analysis in “shell space”. |
188 | Learning to Combine Mid-Level Cues for Object Proposal Generation | Tom Lee, Sanja Fidler, Sven Dickinson | In this paper, we introduce Parametric Min-Loss (PML), a novel structured learning framework for parametric energy functions. |
189 | Enhancing Road Maps by Parsing Aerial Images Around the World | Gellert Mattyus, Shenlong Wang, Sanja Fidler, Raquel Urtasun | In this paper we propose to exploit aerial images in order to enhance freely available world maps. |
190 | Probabilistic Appearance Models for Segmentation and Classification | Julia Kruger, Jan Ehrhardt, Heinz Handels | We propose the use of probabilistic correspondences for statistical appearance models by incorporating appearance information into the framework. |
191 | A Randomized Ensemble Approach to Industrial CT Segmentation | Hyojin Kim, Jayaraman Jayaraman J. Thiagarajan, Peer-Timo Bremer | This paper presents a new ensemble-based segmentation framework for industrial CT images demonstrating that comparatively simple models and randomization strategies can significantly improve the result over existing techniques. |
192 | Semi-Supervised Normalized Cuts for Image Segmentation | Selene E. Chew, Nathan D. Cahill | In this paper, we reformulate NCuts to allow both sets of constraints to be handled in a soft manner, enabling the user to tune the degree to which the constraints are satisfied. |
193 | StereoSnakes: Contour Based Consistent Object Extraction For Stereo Images | Ran Ju, Tongwei Ren, Gangshan Wu | In this paper, we propose a contour based method which searches for consistent object contours instead of regions. |
194 | Semantic Segmentation of RGBD Images With Mutex Constraints | Zhuo Deng, Sinisa Todorovic, Longin Jan Latecki | In this paper, we address the problem of semantic scene segmentation of RGB-D images of indoor scenes. |
195 | Weakly- and Semi-Supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation | George Papandreou, Liang-Chieh Chen, Kevin P. Murphy, Alan L. Yuille | We develop Expectation-Maximization (EM) methods for semantic image segmentation model training under these weakly supervised and semi-supervised settings. |
196 | Efficient Decomposition of Image and Mesh Graphs by Lifted Multicuts | Margret Keuper, Evgeny Levinkov, Nicolas Bonneel, Guillaume Lavoue, Thomas Brox, Bjorn Andres | We propose a generalization of the MP with long-range terms (LMP). |
197 | Parsimonious Labeling | Puneet K. Dokania, M. Pawan Kumar | We propose a new family of discrete energy minimization problems, which we call parsimonious labeling. |
198 | Volumetric Bias in Segmentation and Reconstruction: Secrets and Solutions | Yuri Boykov, Hossam Isack, Carl Olsson, Ismail Ben Ayed | Our general ideas apply to continuous or discrete energy formulations in segmentation, stereo, and other reconstruction problems. |
199 | Entropy Minimization for Convex Relaxation Approaches | Mohamed Souiai, Martin R. Oswald, Youngwook Kee, Junmo Kim, Marc Pollefeys, Daniel Cremers | In this paper, we propose a novel relaxation technique which incorporates the entropy of the objective variable as a measure of relaxation tightness. |
200 | Adaptively Unified Semi-Supervised Dictionary Learning With Active Points | Xiaobo Wang, Xiaojie Guo, Stan Z. Li | In this paper, we present a novel semi-supervised dictionary learning method, which uses the informative coding vectors of both labeled and unlabeled data, and adaptively emphasizes the high confidence coding vectors of unlabeled data to enhance the dictionary discriminative capability simultaneously. |
201 | Constrained Convolutional Neural Networks for Weakly Supervised Segmentation | Deepak Pathak, Philipp Krahenbuhl, Trevor Darrell | We present an approach to learn a dense pixel-wise labeling from image-level tags. |
202 | A Multiscale Variable-Grouping Framework for MRF Energy Minimization | Omer Meir, Meirav Galun, Stav Yagev, Ronen Basri, Irad Yavneh | We present a multiscale approach for minimizing the energy associated with Markov Random Fields (MRFs) with energy functions that include arbitrary pairwise potentials. |
203 | Inferring M-Best Diverse Labelings in a Single One | Alexander Kirillov, Bogdan Savchynskyy, Dmitrij Schlesinger, Dmitry Vetrov, Carsten Rother | We show that the method of Batra et al. can be considered as a greedy approximate algorithm for our model, whereas we introduce an efficient specialized optimization technique for it, based on alpha-expansion. |
204 | Convolutional Sparse Coding for Image Super-Resolution | Shuhang Gu, Wangmeng Zuo, Qi Xie, Deyu Meng, Xiangchu Feng, Lei Zhang | In this paper, we propose a convolutional sparse coding (CSC) based SR (CSC-SR) method to address the consistency issue. |
205 | A Wavefront Marching Method for Solving the Eikonal Equation on Cartesian Grids | Brais Cancela, Marcos Ortega, Manuel G. Penedo | This paper presents a new wavefront propagation method for dealing with the classic Eikonal equation. |
206 | A Projection Free Method for Generalized Eigenvalue Problem With a Nonsmooth Regularizer | Seong Jae Hwang, Maxwell D. Collins, Sathya N. Ravi, Vamsi K. Ithapu, Nagesh Adluru, Sterling C. Johnson, Vikas Singh | Motivated by these needs, this paper presents an optimization scheme to solve generalized eigenvalue problems (GEP) involving a (nonsmooth) regularizer. |
207 | Optimizing Expected Intersection-Over-Union With Candidate-Constrained CRFs | Faruk Ahmed, Dany Tarlow, Dhruv Batra | We study the question of how to make loss-aware predictions in image segmentation settings where the evaluation function is the Intersection-over-Union (IoU) measure that is used widely in evaluating image segmentation systems. |
208 | Higher-Order Inference for Multi-Class Log-Supermodular Models | Jian Zhang, Josip Djolonga, Andreas Krause | We formalize this task as performing inference in log-supermodular models under partition constraints, and present an efficient variational inference technique. |
209 | Depth-Based Hand Pose Estimation: Data, Methods, and Challenges | James S. Supancic III, Gregory Rogez, Yi Yang, Jamie Shotton, Deva Ramanan | To spur further progress we introduce a challenging new dataset with diverse, cluttered scenes. |
210 | Adaptive Dither Voting for Robust Spatial Verification | Xiaomeng Wu, Kunio Kashino | To handle this problem, we propose a new method, called adaptive dither voting, for robust spatial verification. |
211 | Alternating Co-Quantization for Cross-Modal Hashing | Go Irie, Hiroyuki Arai, Yukinobu Taniguchi | We propose a method to minimize the binary quantization errors, which is tailored to cross-modal hashing. |
212 | Learning Deep Representation With Large-Scale Attributes | Wanli Ouyang, Hongyang Li, Xingyu Zeng, Xiaogang Wang | This paper contributes a large-scale object attribute database (The dataset is available on www.ee.cuhk.edu.hk/ xgwang/ImageNetAttribute.html) that contains rich attribute annotations (over 300 attributes) for ~180k samples and 494 object classes. |
213 | Deep Learning Strong Parts for Pedestrian Detection | Yonglong Tian, Ping Luo, Xiaogang Wang, Xiaoou Tang | Unlike previous deep models that directly learned a single detector for pedestrian detection, we propose DeepParts, which consists of extensive part detectors. |
214 | Flowing ConvNets for Human Pose Estimation in Videos | Tomas Pfister, James Charles, Andrew Zisserman | The objective of this work is human pose estimation in videos, where multiple frames are available. |
215 | Top Rank Supervised Binary Coding for Visual Search | Dongjin Song, Wei Liu, Rongrong Ji, David A. Meyer, John R. Smith | In this paper, we propose a novel supervised binary coding approach, namely Top Rank Supervised Binary Coding (Top-RSBC), which explicitly focuses on optimizing the precision of top positions in a Hamming-distance ranking list towards preserving the supervision information. |
216 | BubbLeNet: Foveated Imaging for Visual Discovery | Kevin Matzen, Noah Snavely | We propose a new method for turning an Internet-scale corpus of categorized images into a small set of human-interpretable discriminative visual elements using powerful tools based on deep learning. |
217 | PQTable: Fast Exact Asymmetric Distance Neighbor Search for Product Quantization Using Hash Tables | Yusuke Matsui, Toshihiko Yamasaki, Kiyoharu Aizawa | We propose the product quantization table (PQTable), a product quantization-based hash table that is fast and requires neither parameter tuning nor training steps. |
218 | Lending A Hand: Detecting Hands and Recognizing Activities in Complex Egocentric Interactions | Sven Bambach, Stefan Lee, David J. Crandall, Chen Yu | We develop methods to locate and distinguish between hands in egocentric video using strong appearance models with Convolutional Neural Networks, and introduce a simple candidate region generation approach that outperforms existing techniques at a fraction of the computational cost. |
219 | Fast and Accurate Head Pose Estimation via Random Projection Forests | Donghoon Lee, Ming-Hsuan Yang, Songhwai Oh | In this paper, we consider the problem of estimating the gaze direction of a person from a low-resolution image. |
220 | An MRF-Poselets Model for Detecting Highly Articulated Humans | Duc Thanh Nguyen, Minh-Khoi Tran, Sai-Kit Yeung | This paper proposes a novel part-based model built upon poselets, a notion of parts, and Markov Random Field (MRF) for modelling the human body structure under the variation of human poses and viewpoints. |
221 | Beyond Tree Structure Models: A New Occlusion Aware Graphical Model for Human Pose Estimation | Lianrui Fu, Junge Zhang, Kaiqi Huang | We propose an occlusion aware graphical model which is able to model both self-occlusion and occlusion by the other objects simultaneously. |
222 | Relaxing From Vocabulary: Robust Weakly-Supervised Deep Learning for Vocabulary-Free Image Tagging | Jianlong Fu, Yue Wu, Tao Mei, Jinqiao Wang, Hanqing Lu, Yong Rui | In this paper, we propose a weakly-supervised deep learning model which can be trained from the readily available Web images to relax the dependence on human labors and scale up to arbitrary tags (categories). |
223 | Visual Phrases for Exemplar Face Detection | Vijay Kumar, Anoop Namboodiri, C. V. Jawahar | In this paper, we propose a novel approach that incorporates higher order information in the voting process. |
224 | Spatial Semantic Regularisation for Large Scale Object Detection | Damian Mrowca, Marcus Rohrbach, Judy Hoffman, Ronghang Hu, Kate Saenko, Trevor Darrell | We propose a new multi-class spatial semantic regularisation method based on affinity propagation clustering, which simultaneously optimises across all categories and all proposed locations in the image, to improve both the localisation and categorisation of selected detection proposals. |
225 | Human Pose Estimation in Videos | Dong Zhang, Mubarak Shah | In this paper, we present a method to estimate a sequence of human poses in unconstrained videos. |
226 | Contour Box: Rejecting Object Proposals Without Explicit Closed Contours | Cewu Lu, Shu Liu, Jiaya Jia, Chi-Keung Tang | We propose a new measure subject to the completeness and tightness constraints, where the optimized closed contour should be tightly bounded within an object proposal. |
227 | Registering Images to Untextured Geometry Using Average Shading Gradients | Tobias Plotz, Stefan Roth | In this paper we consider the registration of photographs to 3D models even when no texture information is available. |
228 | Robust Nonrigid Registration by Convex Optimization | Qifeng Chen, Vladlen Koltun | We present an approach to nonrigid registration of 3D surfaces. |
229 | Robust and Optimal Sum-of-Squares-Based Point-to-Plane Registration of Image Sets and Structured Scenes | Danda Pani Paudel, Adlane Habed, Cedric Demonceaux, Pascal Vasseur | For the first time in this paper, a Sum-of-Squares optimization theory framework is employed for identifying point-to-plane mismatches (i.e. outliers) with certainty. |
230 | MeshStereo: A Global Stereo Model With Mesh Alignment Regularization for View Interpolation | Chi Zhang, Zhiwei Li, Yanhua Cheng, Rui Cai, Hongyang Chao, Yong Rui | We present a novel global stereo model designed for view interpolation. |
231 | CV-HAZOP: Introducing Test Data Validation for Computer Vision | Oliver Zendel, Markus Murschitz, Martin Humenberger, Wolfgang Herzner | In this paper we propose a new solution answering these questions using a standard procedure devised by the safety community to validate complex systems: The Hazard and Operability Analysis (HAZOP). |
232 | Structure From Motion Using Structure-Less Resection | Enliang Zheng, Changchang Wu | In this work, we take the collection of already reconstructed cameras as a generalized camera, and determine the absolute pose of a candidate pinhole camera from pure 2D correspondences, which we call it semi-generalized camera pose problem. |
233 | Joint Camera Clustering and Surface Segmentation for Large-Scale Multi-View Stereo | Runze Zhang, Shiwei Li, Tian Fang, Siyu Zhu, Long Quan | In this paper, we propose an optimal decomposition approach to large-scale multi-view stereo from an initial sparse reconstruction. |
234 | Higher-Order CRF Structural Segmentation of 3D Reconstructed Surfaces | Jingbo Liu, Jinglu Wang, Tian Fang, Chiew-Lan Tai, Long Quan | In this paper, we propose a structural segmentation algorithm to partition multi-view stereo reconstructed surfaces of large-scale urban environments into structural segments. |
235 | Hyperpoints and Fine Vocabularies for Large-Scale Location Recognition | Torsten Sattler, Michal Havlena, Filip Radenovic, Konrad Schindler, Marc Pollefeys | Here we explore an orthogonal strategy, which uses all the 3D points and standard sampling, but performs feature matching implicitly, by quantization into a fine vocabulary. |
236 | Globally Optimal 2D-3D Registration From Points or Lines Without Correspondences | Mark Brown, David Windridge, Jean-Yves Guillemaut | We present a novel approach to 2D-3D registration from points or lines without correspondences. |
237 | The HCI Stereo Metrics: Geometry-Aware Performance Analysis of Stereo Algorithms | Katrin Honauer, Lena Maier-Hein, Daniel Kondermann | We show that the RMS is of limited expressiveness for algorithm selection and introduce the HCI Stereo Metrics. |
238 | Merging the Unmatchable: Stitching Visually Disconnected SfM Models | Andrea Cohen, Torsten Sattler, Marc Pollefeys | In this paper, we present a combinatorial approach for solving this variant by automatically stitching multiple sides of a building together. |
239 | 3D Fragment Reassembly Using Integrated Template Guidance and Fracture-Region Matching | Kang Zhang, Wuyi Yu, Mary Manhein, Warren Waggenspack, Xin Li | We develop a new algorithm to effectively integrate both guidance from a template and from matching of adjacent pieces’ fracture-regions. |
240 | Procedural Editing of 3D Building Point Clouds | Ilke Demir, Daniel G. Aliaga, Bedrich Benes | In our work, we tackle the problem of point cloud completion and editing and we approach it via inverse procedural modeling. |
241 | Semantically-Aware Aerial Reconstruction From Multi-Modal Data | Randi Cabezas, Julian Straub, John W. Fisher III | We propose a probabilistic generative model for inferring semantically-informed aerial reconstructions from multi-modal data within a consistent mathematical framework. We introduce a new multi-modal synthetic dataset in order to provide quantitative performance analysis. |
242 | Guaranteed Outlier Removal for Rotation Search | Alvaro Parra Bustos, Tat-Jun Chin | In this paper, we propose a novel outlier removal technique for rotation search. |
243 | Peeking Template Matching for Depth Extension | Simon Korman, Eyal Ofek, Shai Avidan | We propose a method that extends a given depth image into regions in 3D that are not visible from the point of view of the camera. |
244 | Deformable 3D Fusion: From Partial Dynamic 3D Observations to Complete 4D Models | Weipeng Xu, Mathieu Salzmann, Yongtian Wang, Yue Liu | In this paper, we introduce a template-less 4D reconstruction method that incrementally fuses highly-incomplete 3D observations of a deforming object, and generates a complete, temporally-coherent shape representation of the object. |
245 | Non-Parametric Structure-Based Calibration of Radially Symmetric Cameras | Federico Camposeco, Torsten Sattler, Marc Pollefeys | We propose a novel two-step method for estimating the intrinsic and extrinsic calibration of any radially symmetric camera, including non-central systems. |
246 | Exploiting Object Similarity in 3D Reconstruction | Chen Zhou, Fatma Guney, Yizhou Wang, Andreas Geiger | In this paper, we leverage the fact that the larger the reconstructed area, the more likely objects of similar type and shape will occur in the scene. |
247 | You Are Here: Mimicking the Human Thinking Process in Reading Floor-Plans | Hang Chu, Dong Ki Kim, Tsuhan Chen | More precisely, we introduce a new and useful task of locating an user in the floor-plan, by using only a camera and a floor-plan without any other prior information. |
248 | MAP Disparity Estimation Using Hidden Markov Trees | Eric T. Psota, Jedrzej Kowalczuk, Mateusz Mittek, Lance C. Perez | A new method is introduced for stereo matching that operates on minimum spanning trees (MSTs) generated from the images. |
249 | Wide Baseline Stereo Matching With Convex Bounded Distortion Constraints | Meirav Galun, Tal Amir, Tal Hassner, Ronen Basri, Yaron Lipman | We introduce a novel method that integrates a deformation model. |
250 | Interactive Visual Hull Refinement for Specular and Transparent Object Surface Reconstruction | Xinxin Zuo, Chao Du, Sen Wang, Jiangbin Zheng, Ruigang Yang | In this paper we present a method of using standard multi-view images for 3D surface reconstruction of non-Lambertian objects. |
251 | Hierarchical Higher-Order Regression Forest Fields: An Application to 3D Indoor Scene Labelling | Trung T. Pham, Ian Reid, Yasir Latif, Stephen Gould | In this work we propose models with higher-order potentials to describe complex relational information from the 3D scenes. |
252 | Classical Scaling Revisited | Gil Shamai, Yonathan Aflalo, Michael Zibulevsky, Ron Kimmel | We present an efficient solver for Classical Scaling (a specific MDS model) by extending the distances measured from a subset of the points to the rest, while exploiting the smoothness property of the distance functions. |
253 | Dense Continuous-Time Tracking and Mapping With Rolling Shutter RGB-D Cameras | Christian Kerl, Jorg Stuckler, Daniel Cremers | We propose a dense continuous-time tracking and mapping method for RGB-D cameras. |
254 | Dense Image Registration and Deformable Surface Reconstruction in Presence of Occlusions and Minimal Texture | Dat Tien Ngo, Sanghyuk Park, Anne Jorstad, Alberto Crivellaro, Chang D. Yoo, Pascal Fua | In this work, we explicitly address the problem of 3D reconstruction of poorly textured, occluded surfaces, proposing a framework based on a template-matching approach that scales dense robust features by a relevancy score. |
255 | The Likelihood-Ratio Test and Efficient Robust Estimation | Andrea Cohen, Christopher Zach | We propose a new approach for jointly optimizing over model parameters and the inlier noise level based on the likelihood ratio test. |
256 | Reflection Modeling for Passive Stereo | Rahul Nair, Andrew Fitzgibbon, Daniel Kondermann, Carsten Rother | These two properties make inference of the model appear prohibitive, but we present evidence that inference is actually possible using a variant of patch match stereo. |
257 | Detailed Full-Body Reconstructions of Moving People From Monocular RGB-D Sequences | Federica Bogo, Michael J. Black, Matthew Loper, Javier Romero | We compare our recovered models with high-resolution scans from a professional system and with avatars created by a commercial product. |
258 | Efficient Solution to the Epipolar Geometry for Radially Distorted Cameras | Zuzana Kukelova, Jan Heller, Martin Bujnak, Andrew Fitzgibbon, Tomas Pajdla | In this paper, we present a new efficient solution to this problem that uses 10 image correspondences. |
259 | Learning a Descriptor-Specific 3D Keypoint Detector | Samuele Salti, Federico Tombari, Riccardo Spezialetti, Luigi Di Stefano | To overcome these shortcomings, we cast 3D keypoint detection as a binary classification between points whose support can be correctly matched by a predefined 3D descriptor or not, thereby learning a descriptor-specific detector that adapts seamlessly to different scenarios. |
260 | Component-Wise Modeling of Articulated Objects | Valsamis Ntouskos, Marta Sanzari, Bruno Cafaro, Federico Nardi, Fabrizio Natola, Fiora Pirri, Manuel Ruiz | We introduce a novel framework for modeling articulated objects based on the aspects of their components. |
261 | A Collaborative Filtering Approach to Real-Time Hand Pose Estimation | Chiho Choi, Ayan Sinha, Joon Hee Choi, Sujin Jang, Karthik Ramani | Inspired by fast and accurate matrix factorization techniques for collaborative filtering, we develop a real-time algorithm for estimating the hand pose from RGB-D data of a commercial depth camera. |
262 | On the Equivalence of Moving Entrance Pupil and Radial Distortion for Camera Calibration | Avinash Kumar, Narendra Ahuja | In this paper, we show using a thick-lens imaging model, that the variation of entrance pupil location as a function of incident image ray angle is directly responsible for radial distortion in captured images. |
263 | A Linear Generalized Camera Calibration From Three Intersecting Reference Planes | Mai Nishimura, Shohei Nobuhara, Takashi Matsuyama, Shinya Shimizu, Kensaku Fujii | This paper presents a new generalized (or ray-pixel, raxel) camera calibration algorithm for camera systems involving distortions by unknown refraction and reflection processes. |
264 | Towards Pointless Structure From Motion: 3D Reconstruction and Camera Parameters From General 3D Curves | Irina Nurutdinova, Andrew Fitzgibbon | We show how 3D curves can be used to refine camera position estimation in challenging low-texture scenes. |
265 | Attributed Grammars for Joint Estimation of Human Attributes, Part and Pose | Seyoung Park, Song-Chun Zhu | In this paper, we are interested in developing compositional models to explicit representing pose, parts and attributes and tackling the tasks of attribute recognition, pose estimation and part localization jointly. |
266 | Real-Time Pose Estimation Piggybacked on Object Detection | Roman Juranek, Adam Herout, Marketa Dubska, Pavel Zemcik | We present an object detector coupled with pose estimation directly in a single compact and simple model, where the detector shares extracted image features with the pose estimator. |
267 | Understanding and Predicting Image Memorability at a Large Scale | Aditya Khosla, Akhil S. Raju, Antonio Torralba, Aude Oliva | Here, we introduce a novel experimental procedure to objectively measure human memory, building the largest annotated image memorability dataset to date (with 60,000 labeled images from a diverse array of sources). |
268 | Multiple Granularity Descriptors for Fine-Grained Categorization | Dequan Wang, Zhiqiang Shen, Jie Shao, Wei Zhang, Xiangyang Xue, Zheng Zhang | We leverage the fact that a subordinate-level object already has other labels in its ontology tree. |
269 | Guiding the Long-Short Term Memory Model for Image Caption Generation | Xu Jia, Efstratios Gavves, Basura Fernando, Tinne Tuytelaars | In this work we focus on the problem of image caption generation. |
270 | Just Noticeable Differences in Visual Attributes | Aron Yu, Kristen Grauman | We explore the problem of predicting “just noticeable differences” in a visual attribute. |
271 | VQA: Visual Question Answering | Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh | We propose the task of free-form and open-ended Visual Question Answering (VQA). We provide a dataset containing 0.25M images, 0.76M questions, and 10M answers (www.visualqa.org), and discuss the information it provides. |
272 | Localize Me Anywhere, Anytime: A Multi-Task Point-Retrieval Approach | Guoyu Lu, Yan Yan, Li Ren, Jingkuan Song, Nicu Sebe, Chandra Kambhamettu | The main contribution of our paper is that we use a 3D model reconstructed by a short video as the query to realize 3D-to-3D localization under a multi-task point retrieval framework. |
273 | Dense Optical Flow Prediction From a Static Image | Jacob Walker, Abhinav Gupta, Martial Hebert | In this work, we present a convolutional neural network (CNN) based approach for motion prediction. |
274 | Unsupervised Domain Adaptation for Zero-Shot Learning | Elyor Kodirov, Tao Xiang, Zhenyong Fu, Shaogang Gong | In this paper a novel ZSL method is proposed based on unsupervised domain adaptation. |
275 | Visual Madlibs: Fill in the Blank Description Generation and Question Answering | Licheng Yu, Eunbyung Park, Alexander C. Berg, Tamara L. Berg | In this paper, we introduce a new dataset consisting of 360,001 focused natural language descriptions for 10,738 images. |
276 | Actions and Attributes From Wholes and Parts | Georgia Gkioxari, Ross Girshick, Jitendra Malik | We develop a part-based approach by leveraging convolutional network features inspired by recent advances in computer vision. |
277 | DeepBox: Learning Objectness With Convolutional Networks | Weicheng Kuo, Bharath Hariharan, Jitendra Malik | We argue for a data-driven, semantic approach for ranking object proposals. |
278 | Active Object Localization With Deep Reinforcement Learning | Juan C. Caicedo, Svetlana Lazebnik | We present an active detection model for localizing objects in scenes. |
279 | Scene-Domain Active Part Models for Object Representation | Zhou Ren, Chaohui Wang, Alan L. Yuille | In this paper, we are interested in enhancing the expressivity and robustness of part-based models for object representation, in the common scenario where the training data are based on 2D images. |
280 | A Unified Multiplicative Framework for Attribute Learning | Kongming Liang, Hong Chang, Shiguang Shan, Xilin Chen | In this paper, we propose a unified multiplicative framework for attribute learning, which tackles the key problems. |
281 | Contractive Rectifier Networks for Nonlinear Maximum Margin Classification | Senjian An, Munawar Hayat, Salman H. Khan, Mohammed Bennamoun, Farid Boussaid, Ferdous Sohel | To find the optimal nonlinear separating boundary with maximum margin in the input data space, this paper proposes Contractive Rectifier Networks (CRNs), wherein the hidden-layer transformations are restricted to be contraction mappings. |
282 | Augmenting Strong Supervision Using Web Data for Fine-Grained Categorization | Zhe Xu, Shaoli Huang, Ya Zhang, Dacheng Tao | We propose a new method for fine-grained object recognition that employs part-level annotations and deep convolutional neural networks (CNNs) in a unified framework. |
283 | Learning Like a Child: Fast Novel Visual Concept Learning From Sentence Descriptions of Images | Junhua Mao, Xu Wei, Yi Yang, Jiang Wang, Zhiheng Huang, Alan L. Yuille | In this paper, we address the task of learning novel visual concepts, and their interactions with other concepts, from a few images with sentence descriptions. |
284 | Learning Common Sense Through Visual Abstraction | Ramakrishna Vedantam, Xiao Lin, Tanmay Batra, C. Lawrence Zitnick, Devi Parikh | Our key insight is that while visual common sense is depicted in visual content, it is the semantic features that are relevant and not low-level pixel information. |
285 | Domain Generalization for Object Recognition With Multi-Task Autoencoders | Muhammad Ghifary, W. Bastiaan Kleijn, Mengjie Zhang, David Balduzzi | We propose a new feature learning algorithm, Multi-Task Autoencoder (MTAE), that provides good generalization performance for cross-domain object recognition. |
286 | Square Localization for Efficient and Accurate Object Detection | Cewu Lu, Yongyi Lu, Hao Chen, Chi-Keung Tang | The key contribution of this paper is the compact square object localization, which relaxes the exhaustive sliding window from testing all windows of different combinations of aspect ratios. |
287 | Box Aggregation for Proposal Decimation: Last Mile of Object Detection | Shu Liu, Cewu Lu, Jiaya Jia | We explain why it works using some statistics in this paper. |
288 | DeepProposal: Hunting Objects by Cascading Deep Convolutional Layers | Amir Ghodrati, Ali Diba, Marco Pedersoli, Tinne Tuytelaars, Luc Van Gool | In this paper we evaluate the quality of the activation layers of a convolutional neural network (CNN) for the generation of object proposals. |
289 | Semantic Segmentation With Object Clique Potential | Xiaojuan Qi, Jianping Shi, Shu Liu, Renjie Liao, Jiaya Jia | In this paper, we propose an object clique potential for semantic segmentation. |
290 | Automatic Concept Discovery From Parallel Text and Visual Corpora | Chen Sun, Chuang Gan, Ram Nevatia | We propose an automatic visual concept discovery algorithm using parallel text and visual corpora; it filters text terms based on the visual discriminative power of the associated images, and groups them into concepts using visual and semantic similarities. |
291 | Simpler Non-Parametric Methods Provide as Good or Better Results to Multiple-Instance Learning | Ragav Venkatesan, Parag Chandakkar, Baoxin Li | In this paper, we analyze the MIL feature space using modified versions of traditional non-parametric techniques like the Parzen window and k-nearest-neighbour, and develop a learning approach employing distances to k-nearest neighbours of a point in the feature space. |
292 | Monocular Object Instance Segmentation and Depth Ordering With CNNs | Ziyu Zhang, Alexander G. Schwing, Sanja Fidler, Raquel Urtasun | In this paper we tackle the problem of instance-level segmentation and depth ordering from a single monocular image. |
293 | Multimodal Convolutional Neural Networks for Matching Image and Sentence | Lin Ma, Zhengdong Lu, Lifeng Shang, Hang Li | In this paper, we propose multimodal convolutional neural networks (m-CNNs) for matching image and sentence. |
294 | Structural Kernel Learning for Large Scale Multiclass Object Co-Detection | Zeeshan Hayder, Xuming He, Mathieu Salzmann | Here, we address the problem of multiclass object co-detection for large scale datasets. |
295 | Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models | Bryan A. Plummer, Liwei Wang, Chris M. Cervantes, Juan C. Caicedo, Julia Hockenmaier, Svetlana Lazebnik | This paper presents Flickr30k Entities, which augments the 158k captions from Flickr30k with 244k coreference chains linking mentions of the same entities in images, as well as 276k manually annotated bounding boxes corresponding to each entity. |
296 | Predicting Depth, Surface Normals and Semantic Labels With a Common Multi-Scale Convolutional Architecture | David Eigen, Rob Fergus | In this paper we address three different computer vision tasks using a single basic architecture: depth prediction, surface normal estimation, and semantic labeling. |
297 | AttentionNet: Aggregating Weak Directions for Accurate Object Detection | Donggeun Yoo, Sunggyun Park, Joon-Young Lee, Anthony S. Paek, In So Kweon | We present a novel detection method using a deep convolutional neural network (CNN), named AttentionNet. |
298 | Common Subspace for Model and Similarity: Phrase Learning for Caption Generation From Images | Yoshitaka Ushiku, Masataka Yamaguchi, Yusuke Mukuta, Tatsuya Harada | In this paper, we propose a novel phrase-learning method: Common Subspace for Model and Similarity (CoSMoS). |
299 | 3D-Assisted Feature Synthesis for Novel Views of an Object | Hao Su, Fan Wang, Eric Yi, Leonidas J. Guibas | In this paper, given a single input image of an object, we synthesize its features for other views, leveraging an existing modestly-sized 3D model collection of related but not identical objects.To accomplish this, we study the relationship of image patches between different views of the same object, seeking what we call surrogate patches — patches in one view whose feature content predicts well the features of a patch in another view. Based upon these surrogate relationships, we can create feature sets for all views of the latent object on a per patch basis, providing us an augmented multi-view representation of the object. |
300 | Render for CNN: Viewpoint Estimation in Images Using CNNs Trained With Rendered 3D Model Views | Hao Su, Charles R. Qi, Yangyan Li, Leonidas J. Guibas | Inspired by the growing availability of 3D models, we propose a framework to address both issues by combining render-based image synthesis and CNNs (Convolutional Neural Networks). |
301 | Lost Shopping! Monocular Localization in Large Indoor Spaces | Shenlong Wang, Sanja Fidler, Raquel Urtasun | In this paper we propose a novel approach to localization in very large indoor spaces (i.e., 200+ store shopping malls) that takes a single image and a floor plan of the environment as input. |
302 | Camera Pose Voting for Large-Scale Image-Based Localization | Bernhard Zeisl, Torsten Sattler, Marc Pollefeys | In this work we study the benefits and limitations of spatial verification compared to appearance-based filtering. |
303 | MANTRA: Minimum Maximum Latent Structural SVM for Image Classification and Ranking | Thibaut Durand, Nicolas Thome, Matthieu Cord | In this work, we propose a novel Weakly Supervised Learning (WSL) framework dedicated to learn discriminative part detectors from images annotated with a global label. |
304 | DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving | Chenyi Chen, Ari Seff, Alain Kornhauser, Jianxiong Xiao | In this paper, we propose a third paradigm: a direct perception approach to estimate the affordance for driving. |
305 | Active Transfer Learning With Zero-Shot Priors: Reusing Past Datasets for Future Tasks | Efstratios Gavves, Thomas Mensink, Tatiana Tommasi, Cees G. M. Snoek, Tinne Tuytelaars | On this basis we propose an effective active learning algorithm which learns the best possible target classification model with minimum human labeling effort. |
306 | HD-CNN: Hierarchical Deep Convolutional Neural Networks for Large Scale Visual Recognition | Zhicheng Yan, Hao Zhang, Robinson Piramuthu, Vignesh Jagadeesh, Dennis DeCoste, Wei Di, Yizhou Yu | In this paper, we introduce hierarchical deep CNNs (HD-CNNs) by embedding deep CNNs into a category hierarchy. |
307 | Learning The Structure of Deep Convolutional Networks | Jiashi Feng, Trevor Darrell | In this work, we develop a novel method for automatically learning aspects of the structure of a deep model, in order to improve its performance, especially when labeled training data are scarce. |
308 | FlowNet: Learning Optical Flow With Convolutional Networks | Alexey Dosovitskiy, Philipp Fischer, Eddy Ilg, Philip Hausser, Caner Hazirbas, Vladimir Golkov, Patrick van der Smagt, Daniel Cremers, Thomas Brox | In this paper we construct CNNs which are capable of solving the optical flow estimation problem as a supervised learning task. Since existing ground truth data sets are not sufficiently large to train a CNN, we generate a large synthetic Flying Chairs dataset. |
309 | Learning Semi-Supervised Representation Towards a Unified Optimization Framework for Semi-Supervised Learning | Chun-Guang Li, Zhouchen Lin, Honggang Zhang, Jun Guo | In this paper, we formulate the two stages of SSL into a unified optimization framework, which learns both the affinity matrix and the unknown labels simultaneously. |
310 | Context-Guided Diffusion for Label Propagation on Graphs | Kwang In Kim, James Tompkin, Hanspeter Pfister, Christian Theobalt | Inspired by the success of diffusivity tensors for anisotropic diffusion in image processing, we presents anisotropic diffusion on graphs and the corresponding label propagation algorithm. |
311 | Learning to Rank Based on Subsequences | Basura Fernando, Efstratios Gavves, Damien Muselet, Tinne Tuytelaars | In this work we propose MidRank, which learns from moderately sized sub-sequences instead. |
312 | Unsupervised Learning of Visual Representations Using Videos | Xiaolong Wang, Abhinav Gupta | In this paper, we present a simple yet surprisingly powerful approach for unsupervised learning of CNN. |
313 | A Nonparametric Bayesian Approach Toward Stacked Convolutional Independent Component Analysis | Sotirios P. Chatzis, Dimitrios Kosmopoulos | To resolve these issues, in this paper we introduce a convolutional nonparametric Bayesian sparse ICA architecture for overcomplete feature learning from high-dimensional data. |
314 | Robust Principal Component Analysis on Graphs | Nauman Shahid, Vassilis Kalofolias, Xavier Bresson, Michael Bronstein, Pierre Vandergheynst | In this article, we introduce a new model called ‘Robust PCA on Graphs’ which incorporates spectral graph regularization into the Robust PCA framework. |
315 | Projection Bank: From High-Dimensional Data to Medium-Length Binary Codes | Li Liu, Mengyang Yu, Ling Shao | To target a better balance between computational efficiency and accuracies, in this paper, we propose a novel embedding method called Binary Projection Bank (BPB), which can effectively reduce the very high-dimensional representations to medium-dimensional binary codes without sacrificing accuracies. |
316 | Robust Optimization for Deep Regression | Vasileios Belagiannis, Christian Rupprecht, Gustavo Carneiro, Nassir Navab | In this work, we propose a regression model with ConvNets that achieves robustness to such outliers by minimizing Tukey’s biweight function, an M-estimator robust to outliers, as the loss function for the ConvNet. |
317 | Multi-Class Multi-Annotator Active Learning With Robust Gaussian Process for Visual Recognition | Chengjiang Long, Gang Hua | In this paper, we propose a novel Gaussian process classifier model with multiple annotators for multi-class visual recognition. |
318 | Maximum-Margin Structured Learning With Deep Networks for 3D Human Pose Estimation | Sijin Li, Weichen Zhang, Antoni B. Chan | We test our framework on the Human3.6m dataset and obtain state-of-the-art results compared to other recent methods. |
319 | An Exploration of Parameter Redundancy in Deep Networks With Circulant Projections | Yu Cheng, Felix X. Yu, Rogerio S. Feris, Sanjiv Kumar, Alok Choudhary, Shi-Fu Chang | We explore the redundancy of parameters in deep neural networks by replacing the conventional linear projection in fully-connected layers with the circulant projection. |
320 | Additive Nearest Neighbor Feature Maps | Zhenzhen Wang, Xiao-Tong Yuan, Qingshan Liu, Shuicheng Yan | In this paper, we present a concise framework to approximately construct feature maps for nonlinear additive kernels such as the Intersection, Hellinger’s, and Chi^2 kernels. |
321 | Understanding Deep Features With Computer-Generated Imagery | Mathieu Aubry, Bryan C. Russell | We introduce an approach for analyzing the variation of features generated by convolutional neural networks (CNNs) trained on large image datasets with respect to scene factors that occur in natural images. |
322 | Interpolation on the Manifold of K Component GMMs | Hyunwoo J. Kim, Nagesh Adluru, Monami Banerjee, Baba C. Vemuri, Vikas Singh | In this paper, we study the Gaussian mixture model (GMM) representation of the PDFs motivated by its numerous attractive features. |
323 | Context-Aware CNNs for Person Head Detection | Tuan-Hung Vu, Anton Osokin, Ivan Laptev | In this work we focus on detecting human heads in natural scenes. To train and test our model, we introduce a large dataset with 369,846 human heads annotated in 224,740 movie frames. |
324 | Mode-Seeking on Hypergraphs for Robust Geometric Model Fitting | Hanzi Wang, Guobao Xiao, Yan Yan, David Suter | In this paper, we propose a novel geometric model fitting method, called Mode-Seeking on Hypergraphs (MSH), to deal with multi-structure data even in the presence of severe outliers. |
325 | Highly-Expressive Spaces of Well-Behaved Transformations: Keeping It Simple | Oren Freifeld, Soren Hauberg, Kayhan Batmanghelich, John W. Fisher III | We propose novel finite-dimensional spaces of R – R transformations, n [?] |
326 | Entropy-Based Latent Structured Output Prediction | Diane Bouchacourt, Sebastian Nowozin, M. Pawan Kumar | In order to aid their application in computer vision, we study these generalizations with the aim of identifying their strengths and weaknesses. |
327 | Fast Orthogonal Projection Based on Kronecker Product | Xu Zhang, Felix X. Yu, Ruiqi Guo, Sanjiv Kumar, Shengjin Wang, Shi-Fu Chang | We propose a family of structured matrices to speed up orthogonal projections for high-dimensional data commonly seen in computer vision applications. |
328 | PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization | Alex Kendall, Matthew Grimes, Roberto Cipolla | We present a robust and real-time monocular six degree of freedom relocalization system. |
329 | Predicting Multiple Structured Visual Interpretations | Debadeepta Dey, Varun Ramakrishna, Martial Hebert, J. Andrew Bagnell | We present a simple approach for producing a small number of structured visual outputs which have high recall, for a variety of tasks including monocular pose estimation and semantic scene segmentation. |
330 | Look and Think Twice: Capturing Top-Down Visual Attention With Feedback Convolutional Neural Networks | Chunshui Cao, Xianming Liu, Yi Yang, Yinan Yu, Jiang Wang, Zilei Wang, Yongzhen Huang, Liang Wang, Chang Huang, Wei Xu, Deva Ramanan, Thomas S. Huang | In this paper, we will briefly introduce the background of feedbacks in the human visual cortex, which motivates us to develop a computational feedback mechanism in the deep neural networks. |
331 | Matrix Backpropagation for Deep Networks With Structured Layers | Catalin Ionescu, Orestis Vantzos, Cristian Sminchisescu | In this paper we propose a sound mathematical apparatus to formally integrate global structured computation into deep computation architectures. |
332 | Introducing Geometry in Active Learning for Image Segmentation | Ksenia Konyushkova, Raphael Sznitman, Pascal Fua | We propose an Active Learning approach to training a segmentation classifier that exploits geometric priors to streamline the annotation process in 3D image volumes. |
333 | Joint Fine-Tuning in Deep Neural Networks for Facial Expression Recognition | Heechul Jung, Sihaeng Lee, Junho Yim, Sunjeong Park, Junmo Kim | In this paper, to reduce this effort, a deep learning technique, which is regarded as a tool to automatically extract useful features from raw data, is adopted. |
334 | Direct Intrinsics: Learning Albedo-Shading Decomposition by Convolutional Regression | Takuya Narihira, Michael Maire, Stella X. Yu | We introduce a new approach to intrinsic image decomposition, the task of decomposing a single image into albedo and shading components. |
335 | Face Flow | Patrick Snape, Anastasios Roussos, Yannis Panagakis, Stefanos Zafeiriou | In this paper, we propose a method for the robust and efficient computation of multi-frame optical flow in an expressive sequence of facial images. |
336 | Discriminative Low-Rank Tracking | Yao Sui, Yafei Tang, Li Zhang | In this work, we exploit the advantages of the both approaches to achieve a robust tracker. |
337 | SOWP: Spatially Ordered and Weighted Patch Descriptor for Visual Tracking | Han-Ul Kim, Dae-Youn Lee, Jae-Young Sim, Chang-Su Kim | A simple yet effective object descriptor for visual tracking is proposed in this paper. |
338 | Live Repetition Counting | Ofir Levy, Lior Wolf | Live Repetition Counting |
339 | Near-Online Multi-Target Tracking With Aggregated Local Flow Descriptor | Wongun Choi | In this paper, we tackle two key aspects of multiple target tracking problem: 1) designing an accurate affinity measure to associate detections and 2) implementing an efficient and accurate (near) online multiple target tracking algorithm. |
340 | Multi-Kernel Correlation Filter for Visual Tracking | Ming Tang, Jiayi Feng | In this paper, we will derive a multi-kernel correlation filter (MKCF) based tracker which fully takes advantage of the invariance-discriminative power spectrums of various features to further improve the performance. |
341 | Joint Probabilistic Data Association Revisited | Seyed Hamid Rezatofighi, Anton Milan, Zhen Zhang, Qinfeng Shi, Anthony Dick, Ian Reid | In this paper, we revisit the joint probabilistic data association (JPDA) technique and propose a novel solution based on recent developments in finding the m-best solutions to an integer linear program. |
342 | Tracking-by-Segmentation With Online Gradient Boosting Decision Tree | Jeany Son, Ilchae Jung, Kayoung Park, Bohyung Han | We propose an online tracking algorithm that adaptively models target appearances based on an online gradient boosting decision tree. |
343 | Exploring Causal Relationships in Visual Object Tracking | Karel Lebeda, Simon Hadfield, Richard Bowden | In this paper we explore these relationships, and pro- vide statistical tools to detect and quantify them; these are based on transfer entropy and stem from information the- ory. |
344 | Hierarchical Convolutional Features for Visual Tracking | Chao Ma, Jia-Bin Huang, Xiaokang Yang, Ming-Hsuan Yang | In this paper, we exploit features extracted from deep convolutional neural networks trained on object recognition datasets to improve tracking accuracy and robustness. |
345 | Robust Non-Rigid Motion Tracking and Surface Reconstruction Using L0 Regularization | Kaiwen Guo, Feng Xu, Yangang Wang, Yebin Liu, Qionghai Dai | We present a new motion tracking method to robustly reconstruct non-rigid geometries and motions from single view depth inputs captured by a consumer depth sensor. |
346 | Online Object Tracking With Proposal Selection | Yang Hua, Karteek Alahari, Cordelia Schmid | In this paper, we address this problem by formulating it as a proposal selection task and making two contributions. |
347 | Understanding and Diagnosing Visual Tracking Systems | Naiyan Wang, Jianping Shi, Dit-Yan Yeung, Jiaya Jia | To address this issue, we propose a framework by breaking a tracker down into five constituent parts, namely, motion model, feature extractor, observation model, model updater, and ensemble post-processor. |
348 | Integrating Dashcam Views Through Inter-Video Mapping | Hsin-I Chen, Yi-Ling Chen, Wei-Tse Lee, Fan Wang, Bing-Yu Chen | In this paper, an inter-video mapping approach is proposed to integrate video footages from two dashcams installed on a preceding and its following vehicle to provide the illusion that the driver of the following vehicle can see-through the preceding one. |
349 | Visual Tracking With Fully Convolutional Networks | Lijun Wang, Wanli Ouyang, Xiaogang Wang, Huchuan Lu | We propose a new approach for general object tracking with fully convolutional neural network. |
350 | Multiple Feature Fusion via Weighted Entropy for Visual Tracking | Lin Ma, Jiwen Lu, Jianjiang Feng, Jie Zhou | In this paper, we propose a new data-adaptive visual tracking approach by using multiple feature fusion via weighted entropy. |
351 | Pedestrian Travel Time Estimation in Crowded Scenes | Shuai Yi, Hongsheng Li, Xiaogang Wang | In this paper, we target on the problem of estimating the statistic of pedestrian travel time within a period from an entrance to a destination in a crowded scene. |
352 | Unsupervised Synchrony Discovery in Human Interaction | Wen-Sheng Chu, Jiabei Zeng, Fernando De la Torre, Jeffrey F. Cohn, Daniel S. Messinger | We present an unsupervised approach to discover interpersonal synchrony, referred as to two or more persons preforming common actions in overlapping video frames or segments. |
353 | Efficient Video Segmentation Using Parametric Graph Partitioning | Chen-Ping Yu, Hieu Le, Gregory Zelinsky, Dimitris Samaras | In this work, we propose an efficient and robust video segmentation framework based on parametric graph partitioning (PGP), a fast, almost parameter free graph partitioning method that identifies and removes between-cluster edges to form node clusters. |
354 | Learning to Track for Spatio-Temporal Action Localization | Philippe Weinzaepfel, Zaid Harchaoui, Cordelia Schmid | We propose an effective approach for spatio-temporal action localization in realistic videos. |
355 | Unsupervised Object Discovery and Tracking in Video Collections | Suha Kwak, Minsu Cho, Ivan Laptev, Jean Ponce, Cordelia Schmid | We formulate the problem as a combination of two complementary processes: discovery and tracking. |
356 | Car That Knows Before You Do: Anticipating Maneuvers via Learning Temporal Driving Models | Ashesh Jain, Hema S. Koppula, Bharad Raghavan, Shane Soh, Ashutosh Saxena | We propose an Autoregressive Input-Output HMM to model the contextual information alongwith the maneuvers. |
357 | Activity Auto-Completion: Predicting Human Activities From Partial Videos | Zhen Xu, Laiyun Qing, Jun Miao | In this paper, we propose an activity auto-completion (AAC) model for human activity prediction by formulating activity prediction as a query auto-completion (QAC) problem in information retrieval. |
358 | Person Re-Identification With Correspondence Structure Learning | Yang Shen, Weiyao Lin, Junchi Yan, Mingliang Xu, Jianxin Wu, Jingdong Wang | This paper addresses the problem of handling spatial misalignments due to camera-view changes or human-pose variations in person re-identification. |
359 | Adaptive Exponential Smoothing for Online Filtering of Pixel Prediction Maps | Kang Dang, Jiong Yang, Junsong Yuan | We propose an efficient online video filtering method, called adaptive exponential filtering (AES) to refine pixel prediction maps. |
360 | P-CNN: Pose-Based CNN Features for Action Recognition | Guilhem Cheron, Ivan Laptev, Cordelia Schmid | To this end we propose a new Pose-based Convolutional Neural Network descriptor (P-CNN) for action recognition. |
361 | Fully Connected Object Proposals for Video Segmentation | Federico Perazzi, Oliver Wang, Markus Gross, Alexander Sorkine-Hornung | We present a novel approach to video segmentation using multiple object proposals. |
362 | Video Segmentation With Just a Few Strokes | Naveen Shankar Nagaraja, Frank R. Schmidt, Thomas Brox | We compare our approach to a diverse set of algorithms in terms of user effort and in terms of performance on common video segmentation benchmarks. |
363 | Actionness-Assisted Recognition of Actions | Ye Luo, Loong-Fah Cheong, An Tran | Actionness-Assisted Recognition of Actions |
364 | COUNT Forest: CO-Voting Uncertain Number of Targets Using Random Forest for Crowd Density Estimation | Viet-Quoc Pham, Tatsuo Kozakaya, Osamu Yamaguchi, Ryuzo Okada | This paper presents a patch-based approach for crowd density estimation in public scenes. |
365 | Multi-Cue Structure Preserving MRF for Unconstrained Video Segmentation | Saehoon Yi, Vladimir Pavlovic | We propose a Markov Random Field model for unconstrained video segmentation that relies on tight integration of multiple cues: vertices are defined from contour based superpixels, unary potentials from temporally smooth label likelihood and pairwise potentials from global structure of a video. |
366 | Motion Trajectory Segmentation via Minimum Cost Multicuts | Margret Keuper, Bjoern Andres, Thomas Brox | In this paper, we formulate the segmentation of a video sequence based on point trajectories as a minimum cost multicut problem. |
367 | Action Localization in Videos Through Context Walk | Khurram Soomro, Haroon Idrees, Mubarak Shah | This paper presents an efficient approach for localizing actions by learning contextual relations, in the form of relative locations between different video regions. |
368 | RGB-W: When Vision Meets Wireless | Alexandre Alahi, Albert Haque, Li Fei-Fei | Inspired by the recent success of RGB-D cameras, we propose the enrichment of RGB data with an additional “quasi-free” modality, namely, the wireless signal (e.g., wifi or Bluetooth) emitted by individuals’ cell phones, referred to as RGB-W. |
369 | Action Detection by Implicit Intentional Motion Clustering | Wei Chen, Jason J. Corso | We conduct a quantitative analysis of intentional movement, and our findings motivate a new approach for implicit intentional movement extraction that is based on spatiotemporal trajectory clustering by leveraging the properties of intentional movement. |
370 | Simultaneous Foreground Detection and Classification With Hybrid Features | Jaemyun Kim, Adin Ramirez Rivera, Byungyong Ryu, Oksam Chae | In this paper, we propose a hybrid background model that relies on edge and non-edge features of the image to produce the model. |
371 | Training a Feedback Loop for Hand Pose Estimation | Markus Oberweger, Paul Wohlhart, Vincent Lepetit | We propose an entirely data-driven approach to estimating the 3D pose of a hand given a depth image. |
372 | Opening the Black Box: Hierarchical Sampling Optimization for Estimating Human Hand Pose | Danhang Tang, Jonathan Taylor, Pushmeet Kohli, Cem Keskin, Tae-Kyun Kim, Jamie Shotton | In this paper, we show that we can significantly improving upon black box optimization by exploiting high-level knowledge of the structure of the parameters and using a local surrogate energy function. |
373 | Panoptic Studio: A Massively Multiview System for Social Motion Capture | Hanbyul Joo, Hao Liu, Lei Tan, Lin Gui, Bart Nabbe, Iain Matthews, Takeo Kanade, Shohei Nobuhara, Yaser Sheikh | We present an approach to capture the 3D structure and motion of a group of people engaged in a social interaction. |
374 | Where to Buy It: Matching Street Clothing Photos in Online Shops | M. Hadi Kiapour, Xufeng Han, Svetlana Lazebnik, Alexander C. Berg, Tamara L. Berg | In this paper, we define a new task, Exact Street to Shop, where our goal is to match a real-world example of a garment item to the same item in an online shop. We collect a new dataset for this application containing 404,683 shop photos collected from 25 different online retailers and 20,357 street photos, providing a total of 39,479 clothing item matches between street and shop photos. |
375 | Multi-Task Recurrent Neural Network for Immediacy Prediction | Xiao Chu, Wanli Ouyang, Wei Yang, Xiaogang Wang | In this paper, we propose to predict immediacy for interacting persons from still images. We propose a rich set of immediacy representations that help to predict immediacy from imperfect 1-person and 2-person pose estimation results. |
376 | Learning Complexity-Aware Cascades for Deep Pedestrian Detection | Zhaowei Cai, Mohammad Saberian, Nuno Vasconcelos | Learning Complexity-Aware Cascades for Deep Pedestrian Detection |
377 | Polarized 3D: High-Quality Depth Sensing With Polarization Cues | Achuta Kadambi, Vage Taamazyan, Boxin Shi, Ramesh Raskar | We propose a framework to combine surface normals from polarization (hereafter polarization normals) with an aligned depth map. |
378 | Airborne Three-Dimensional Cloud Tomography | Aviad Levis, Yoav Y. Schechner, Amit Aides, Anthony B. Davis | For light-matter interaction that accounts for multiple-scattering, we use the 3D radiative transfer equation as a forward model. |
379 | Leave-One-Out Kernel Optimization for Shadow Detection | Tomas F. Yago Vicente, Minh Hoai, Dimitris Samaras | The objective of this work is to detect shadows in images. |
380 | Removing Rain From a Single Image via Discriminative Sparse Coding | Yu Luo, Yong Xu, Hui Ji | The paper aims at developing an effective algorithm to remove visual effects of rain from a single rainy image, i.e. separate the rain layer and the de-rained image layer from an rainy image. |
381 | Mutual-Structure for Joint Filtering | Xiaoyong Shen, Chao Zhou, Li Xu, Jiaya Jia | To address this issue, we propose the concept of mutual-structure, which refers to the structural information that is contained in both images and thus can be safely enhanced by joint filtering, and an untraditional objective function that can be efficiently optimized to yield mutual structure. |
382 | Photometric Stereo in a Scattering Medium | Zak Murez, Tali Treibitz, Ravi Ramamoorthi, David Kriegman | We measure the object point-spread function and introduce a simple deconvolution method. |
383 | Resolving Scale Ambiguity Via XSlit Aspect Ratio Analysis | Wei Yang, Haiting Lin, Sing Bing Kang, Jingyi Yu | In this paper, we show that alternative non-perspective cameras such as the crossed-slit or XSlit cameras exhibit a different depth-dependent aspect ratio (DDAR) property that can be used to 3D recovery. |
384 | Single-Shot Specular Surface Reconstruction With Gonio-Plenoptic Imaging | Lingfei Meng, Liyang Lu, Noah Bedard, Kathrin Berkner | We present a gonio-plenoptic imaging system that realizes a single-shot shape measurement for specular surfaces. |
385 | TransCut: Transparent Object Segmentation From a Light-Field Image | Yichao Xu, Hajime Nagahara, Atsushi Shimada, Rin-ichiro Taniguchi | We propose a method that overcomes these problems using the consistency and distortion properties of a light-field image. |
386 | Depth Recovery From Light Field Using Focal Stack Symmetry | Haiting Lin, Can Chen, Sing Bing Kang, Jingyi Yu | We describe a technique to recover depth from a light field (LF) using two proposed features of the LF focal stack. |
387 | Depth Map Estimation and Colorization of Anaglyph Images Using Local Color Prior and Reverse Intensity Distribution | W. Williem, Ramesh Raskar, In Kyu Park | In this paper, we present a joint iterative anaglyph stereo matching and colorization framework for obtaining a set of disparity maps and colorized images. |
388 | Learning Data-Driven Reflectance Priors for Intrinsic Image Decomposition | Tinghui Zhou, Philipp Krahenbuhl, Alexei A. Efros | We propose a data-driven approach for intrinsic image decomposition, which is the process of inferring the confounding factors of reflectance and shading in an image. |
389 | Photometric Stereo With Small Angular Variations | Jian Wang, Yasuyuki Matsushita, Boxin Shi, Aswin C. Sankaranarayanan | We explore both theoretical justification and practical issues in the design of a compact and portable photometric stereo device on which a camera is surrounded by a ring of point light sources. |
390 | Occlusion-Aware Depth Estimation Using Light-Field Cameras | Ting-Chun Wang, Alexei A. Efros, Ravi Ramamoorthi | Recent work has demonstrated practical methods for passive depth estimation from light-field images. |
391 | Oriented Light-Field Windows for Scene Flow | Pratul P. Srinivasan, Michael W. Tao, Ren Ng, Ravi Ramamoorthi | In this paper, we leverage the recent use of light-field cameras to propose alternative – oriented light-field windows that enable more robust and accurate pixel comparisons. |
392 | Extended Depth of Field Catadioptric Imaging Using Focal Sweep | Ryunosuke Yokoya, Shree K. Nayar | In this paper, we use focal sweep to extend the DOF of a catadioptric imaging system. |
393 | Intrinsic Depth: Improving Depth Transfer With Intrinsic Images | Naejin Kong, Michael J. Black | We formulate the estimation of dense depth maps from video sequences as a problem of intrinsic image estimation. |
394 | Separating Fluorescent and Reflective Components by Using a Single Hyperspectral Image | Yinqiang Zheng, Ying Fu, Antony Lam, Imari Sato, Yoichi Sato | This paper introduces a novel method to separate fluorescent and reflective components in the spectral domain. |
395 | Frequency-Based Environment Matting by Compressive Sensing | Yiming Qian, Minglun Gong, Yee-Hong Yang | In this paper, we propose a novel approach to capturing and extracting the matte of a real scene effectively and efficiently. |
396 | Complementary Sets of Shutter Sequences for Motion Deblurring | Hae-Gon Jeon, Joon-Young Lee, Yudeog Han, Seon Joo Kim, In So Kweon | In this paper, we present a novel multi-image motion deblurring method utilizing the coded exposure technique. |
397 | Hyperspectral Compressive Sensing Using Manifold-Structured Sparsity Prior | Lei Zhang, Wei Wei, Yanning Zhang, Fei Li, Chunhua Shen, Qinfeng Shi | To reconstruct hyperspectral image (HSI) accurately from a few noisy compressive measurements, we present a novel manifold-structured sparsity prior based hyperspectral compressive sensing (HCS) method in this study. |
398 | A Gaussian Process Latent Variable Model for BRDF Inference | Stamatios Georgoulis, Vincent Vanweddingen, Marc Proesmans, Luc Van Gool | In this paper we address the problem of inferring higher order reflectance information starting from the minimal input of a single BRDF slice. |
399 | Active One-Shot Scan for Wide Depth Range Using a Light Field Projector Based on Coded Aperture | Hiroshi Kawasaki, Satoshi Ono, Yuki Horita, Yuki Shiba, Ryo Furukawa, Shinsaku Hiura | In the paper, we solve the problems by introducing a light field projector, which can project a depth-dependent pattern. |
400 | Model-Based Tracking at 300Hz Using Raw Time-of-Flight Observations | Jan Stuhmer, Sebastian Nowozin, Andrew Fitzgibbon, Richard Szeliski, Travis Perry, Sunil Acharya, Daniel Cremers, Jamie Shotton | In this paper, we show how to perform model-based object tracking which allows to reconstruct the object’s depth at an order of magnitude higher frame-rate through simple modifications to an off-the-shelf depth camera. |
401 | Hyperspectral Super-Resolution by Coupled Spectral Unmixing | Charis Lanaras, Emmanuel Baltsavias, Konrad Schindler | In this paper, we propose a method which performs hyperspectral super-resolution by jointly unmixing the two input images into the pure reflectance spectra of the observed materials and the associated mixing coefficients. |
402 | Depth Selective Camera: A Direct, On-Chip, Programmable Technique for Depth Selectivity in Photography | Ryuichi Tadano, Adithya Kumar Pediredla, Ashok Veeraraghavan | In this paper, we show that such correlational sensors can also be used to selectively accept or reject light rays from certain scene depths. |
403 | A Groupwise Multilinear Correspondence Optimization for 3D Faces | Timo Bolkart, Stefanie Wuhrer | Inspired by the minimum description length approach, we propose the first method to jointly optimize a multilinear model and the registration of the 3D scans used for training. |
404 | Selective Encoding for Recognizing Unreliably Localized Faces | Ang Li, Vlad Morariu, Larry S. Davis | We propose a selective encoding framework which injects relevance information (e.g., foreground/background probabilities) into each cluster of a descriptor codebook. |
405 | Confidence Preserving Machine for Facial Action Unit Detection | Jiabei Zeng, Wen-Sheng Chu, Fernando De la Torre, Jeffrey F. Cohn, Zhang Xiong | To address the ubiquity of error, we propose a Confident Preserving Machine (CPM) that follows an easy-to-hard classification strategy. |
406 | Learning Social Relation Traits From Face Images | Zhanpeng Zhang, Ping Luo, Chen-Change Loy, Xiaoou Tang | Motivated by psychological studies, we investigate if such fine grained and high-level relation traits can be characterised and quantified from face images in the wild. |
407 | Robust Heart Rate Measurement From Video Using Select Random Patches | Antony Lam, Yoshinori Kuno | We present conditions under which cardiac activity extraction from local regions of the face can be treated as a linear Blind Source Separation problem and propose a simple but robust algorithm for selecting good local regions. |
408 | Robust Model-Based 3D Head Pose Estimation | Gregory P. Meyer, Shalini Gupta, Iuri Frosio, Dikpal Reddy, Jan Kautz | We introduce a method for accurate three dimensional head pose estimation using a commodity depth camera. |
409 | Robust Facial Landmark Detection Under Significant Head Poses and Occlusion | Yue Wu, Qiang Ji | In this work, we propose a unified robust cascade regression framework that can handle both images with severe occlusion and images with large head poses. |
410 | Conditional Convolutional Neural Network for Modality-Aware Face Recognition | Chao Xiong, Xiaowei Zhao, Danhang Tang, Karlekar Jayashree, Shuicheng Yan, Tae-Kyun Kim | We propose a conditional Convolutional Neural Network, named as c-CNN, to handle multimodal face recognition. |
411 | From Facial Parts Responses to Face Detection: A Deep Learning Approach | Shuo Yang, Ping Luo, Chen-Change Loy, Xiaoou Tang | In this paper, we propose a novel deep convolutional network (DCN) that achieves outstanding performance on FDDB, PASCAL Face, and AFW. |
412 | Efficient PSD Constrained Asymmetric Metric Learning for Person Re-Identification | Shengcai Liao, Stan Z. Li | To address the above issues, we derive a logistic metric learning approach with the PSD constraint and an asymmetric sample weighting strategy. |
413 | Pose-Invariant 3D Face Alignment | Amin Jourabloo, Xiaoming Liu | In order to address these limitations, this paper proposes a novel face alignment algorithm that estimates both 2D and 3D landmarks and their 2D visibilities for a face image with an arbitrary pose. |
414 | From Emotions to Action Units With Hidden and Semi-Hidden-Task Learning | Adria Ruiz, Joost Van de Weijer, Xavier Binefa | In this paper, we investigate how the use of large databases labelled according to the 6 universal facial expressions can increase the generalization ability of Action Unit classifiers. |
415 | Automated Facial Trait Judgment and Election Outcome Prediction: Social Dimensions of Face | Jungseock Joo, Francis F. Steen, Song-Chun Zhu | In this paper, we study a fully automated system that can infer the perceived traits of a person from his face — social dimensions, such as “intelligence,” “honesty,” and “competence” — and how those traits can be used to predict the outcomes of real-world social events that involve long-term commitments, such as political elections, job hires, and marriage engagements. |
416 | Simultaneous Local Binary Feature Learning and Encoding for Face Recognition | Jiwen Lu, Venice Erin Liong, Jie Zhou | In this paper, we propose a simultaneous local binary feature learning and encoding (SLBFLE) method for face recognition. |
417 | Deep Learning Face Attributes in the Wild | Ziwei Liu, Ping Luo, Xiaogang Wang, Xiaoou Tang | We propose a novel deep learning framework for attribute prediction in the wild. |
418 | Multi-Task Learning With Low Rank Attribute Embedding for Person Re-Identification | Chi Su, Fan Yang, Shiliang Zhang, Qi Tian, Larry S. Davis, Wen Gao | We propose a novel Multi-Task Learning with Low Rank Attribute Embedding (MTL-LORAE) framework for person re-identification. |
419 | Regressing a 3D Face Shape From a Single Image | Sergey Tulyakov, Nicu Sebe | In this work we present a method to estimate a 3D face shape from a single image. |
420 | Rendering of Eyes for Eye-Shape Registration and Gaze Estimation | Erroll Wood, Tadas Baltrusaitis, Xucong Zhang, Yusuke Sugano, Peter Robinson, Andreas Bulling | We propose synthesizing perfectly labelled photo-realistic training data in a fraction of the time. |
421 | Multi-Scale Learning for Low-Resolution Person Re-Identification | Xiang Li, Wei-Shi Zheng, Xiaojuan Wang, Tao Xiang, Shaogang Gong | To solve this LR person re-id problem, we propose a novel joint multi-scale learning framework, termed joint multi-scale discriminant component analysis (JUDEA). |
422 | Learning to Transfer: Transferring Latent Task Structures and Its Application to Person-Specific Facial Action Unit Detection | Timur Almaev, Brais Martinez, Michel Valstar | In this article we explore the problem of constructing person-specific models for the detection of facial Action Units (AUs), addressing the problem from the point of view of Transfer Learning and Multi-Task Learning. |
423 | Pairwise Conditional Random Forests for Facial Expression Recognition | Arnaud Dapogny, Kevin Bailly, Severine Dubuisson | In this paper, we propose to learn Random Forests from heterogeneous derivative features (e.g. facial fiducial point movements or texture variations) upon pairs of images. |
424 | Multi-Conditional Latent Variable Model for Joint Facial Action Unit Detection | Stefanos Eleftheriadis, Ognjen Rudovic, Maja Pantic | We propose a novel multi-conditional latent variable model for simultaneous facial feature fusion and detection of facial action units. |
425 | Leveraging Datasets With Varying Annotations for Face Alignment via Deep Regression Network | Jie Zhang, Meina Kan, Shiguang Shan, Xilin Chen | In this work, we propose a deep regression network coupled with sparse shape regression (DRN-SSR) to predict the union of all types of landmarks by leveraging datasets with varying annotations, each dataset with one type of annotation. |
426 | A Spatio-Temporal Appearance Representation for Viceo-Based Pedestrian Re-Identification | Kan Liu, Bingpeng Ma, Wei Zhang, Rui Huang | In this paper we consider the temporal alignment problem, in addition to the spatial one, and propose a new approach that takes the video of a walking person as input and builds a spatio-temporal appearance representation for pedestrian re-identification. |
427 | Two Birds, One Stone: Jointly Learning Binary Code for Large-Scale Face Image Retrieval and Attributes Prediction | Yan Li, Ruiping Wang, Haomiao Liu, Huajie Jiang, Shiguang Shan, Xilin Chen | For this purpose, we propose a novel binary code learning framework by jointly encoding identity discriminability and a number of facial attributes into unified binary code. |
428 | An Accurate Iris Segmentation Framework Under Relaxed Imaging Constraints Using Total Variation Model | Zijing Zhao, Kumar Ajay | This paper proposes a novel and more accurate iris segmentation framework to automatically segment iris region from the face images acquired with relaxed imaging under visible or near-infrared illumination, which provides strong feasibility for applications in surveillance, forensics and the search for missing children, etc. |
429 | Discriminative Pose-Free Descriptors for Face and Object Matching | Soubhik Sanyal, Sivaram Prasad Mudunuri, Soma Biswas | In this paper, we propose a discriminative pose-free descriptor (DPFD) which can be used to match faces/objects across pose variations. |
430 | Bi-Shifting Auto-Encoder for Unsupervised Domain Adaptation | Meina Kan, Shiguang Shan, Xilin Chen | To alleviate the discrepancy between source and target domains, we propose a domain adaptation method, named as Bi-shifting Auto-Encoder network (BAE). |
431 | Regressive Tree Structured Model for Facial Landmark Localization | Gee-Sern Hsu, Kai-Hsiang Chang, Shih-Chieh Huang | We propose the Regressive Tree Structure Model (RTSM) to improve the run-time speed and localization accuracy. |
432 | Person Recognition in Personal Photo Collections | Seong Joon Oh, Rodrigo Benenson, Mario Fritz, Bernt Schiele | We propose a convnet based person recognition system on which we provide an in-depth analysis of informativeness of different body cues, impact of training data, and the common failure modes of the system. |
433 | Robust Statistical Face Frontalization | Christos Sagonas, Yannis Panagakis, Stefanos Zafeiriou, Maja Pantic | In this paper, we propose a novel method for joint frontal view reconstruction and landmark localization using a small set of frontal images only. |
434 | PIEFA: Personalized Incremental and Ensemble Face Alignment | Xi Peng, Shaoting Zhang, Yu Yang, Dimitris N. Metaxas | To address these limitations, we propose to exploit incremental learning for personalized ensemble alignment. |
435 | Understanding Everyday Hands in Action From RGB-D Images | Gregory Rogez, James S. Supancic III, Deva Ramanan | We introduce a large dataset of 12000 RGB-D images covering 71 everyday grasps in natural interactions. |
436 | Example-Based Modeling of Facial Texture From Deficient Data | Arnaud Dessein, William A. P. Smith, Richard C. Wilson, Edwin R. Hancock | We present an approach to modeling ear-to-ear, high-quality texture from one or more partial views of a face with possibly poor resolution and noise. |
437 | Learning to Predict Saliency on Face Images | Mai Xu, Yun Ren, Zulin Wang | This paper proposes a novel method, which learns to detect saliency of face images. To be more specific, we obtain a database of eye tracking over extensive face images, via conducting an eye tracking experiment. |
438 | Group Membership Prediction | Ziming Zhang, Yuting Chen, Venkatesh Saligrama | In this context we propose a novel probability model and introduce latent view-specific and view-shared random variables to jointly account for the view-specific appearance and cross-view similarities among data instances. |
439 | Extraction of Virtual Baselines From Distorted Document Images Using Curvilinear Projection | Gaofeng Meng, Zuming Huang, Yonghong Song, Shiming Xiang, Chunhong Pan | In this paper, we propose an efficient method for accurate extraction of these virtual visual cues from a curved document image. |
440 | Robust RGB-D Odometry Using Point and Line Features | Yan Lu, Dezhen Song | In experiments we compare our method with state-of-the-art methods including a keypoint-based approach and a dense visual odometry algorithm. |
441 | Learning a Discriminative Model for the Perception of Realism in Composite Images | Jun-Yan Zhu, Philipp Krahenbuhl, Eli Shechtman, Alexei A. Efros | In this work, we are answering this question from a data-driven perspective by learning the perception of visual realism directly from large amounts of data. |
442 | What Makes Tom Hanks Look Like Tom Hanks | Supasorn Suwajanakorn, Steven M. Seitz, Ira Kemelmacher-Shlizerman | We demonstrate convincing results on a large variety of celebrities derived from Internet imagery and video. |
443 | Wide-Area Image Geolocalization With Aerial Reference Imagery | Scott Workman, Richard Souvenir, Nathan Jacobs | We propose to use deep convolutional neural networks to address the problem of cross-view image geolocalization, in which the geolocation of a ground-level query image is estimated by matching to georeferenced aerial images. To support training these networks, we introduce a massive database that contains pairs of aerial and ground-level images from across the United States. |
444 | Personalized Age Progression With Aging Dictionary | Xiangbo Shu, Jinhui Tang, Hanjiang Lai, Luoqi Liu, Shuicheng Yan | In this paper, we aim to automatically render aging faces in a personalized way. |
445 | FaceDirector: Continuous Control of Facial Performance in Video | Charles Malleson, Jean-Charles Bazin, Oliver Wang, Derek Bradley, Thabo Beeler, Adrian Hilton, Alexander Sorkine-Hornung | We present a method to continuously blend between multiple facial performances of an actor, which can contain different facial expressions or emotional states. |
446 | Synthesizing Illumination Mosaics From Internet Photo-Collections | Dinghuang Ji, Enrique Dunn, Jan-Michael Frahm | We propose a framework for the automatic creation of time-lapse mosaics of a given scene. |
447 | Hot or Not: Exploring Correlations Between Appearance and Temperature | Daniel Glasner, Pascal Fua, Todd Zickler, Lihi Zelnik-Manor | In this paper we explore interactions between the appearance of an outdoor scene and the ambient temperature. |
448 | SPM-BP: Sped-up PatchMatch Belief Propagation for Continuous MRFs | Yu Li, Dongbo Min, Michael S. Brown, Minh N. Do, Jiangbo Lu | This paper proposes a novel algorithm called sped-up PMBP (SPM-BP) to tackle this critical computational bottleneck and speeds up PMBP by 50-100 times. |
449 | Flow Fields: Dense Correspondence Fields for Highly Accurate Large Displacement Optical Flow Estimation | Christian Bailer, Bertram Taetz, Didier Stricker | In this paper we present a dense correspondence field approach that is much less outlier prone and thus much better suited for optical flow estimation than approximate nearest neighbor fields. |
450 | Dense Semantic Correspondence Where Every Pixel is a Classifier | Hilton Bristow, Jack Valmadre, Simon Lucey | We pose the correspondence problem as a graphical model, where the unary potentials are computed via convolution with the set of exemplar classifiers, and the joint potentials enforce smoothly varying correspondence assignment. |
451 | Multi-Image Matching via Fast Alternating Minimization | Xiaowei Zhou, Menglong Zhu, Kostas Daniilidis | In this paper we propose a global optimization-based approach to jointly matching a set of images. |
452 | Differential Recurrent Neural Networks for Action Recognition | Vivek Veeriah, Naifan Zhuang, Guo-Jun Qi | To address this problem, we propose a differential gating scheme for the LSTM neural network, which emphasizes on the change in information gain caused by the salient motions between the successive frames. |
453 | Similarity Gaussian Process Latent Variable Model for Multi-Modal Data Analysis | Guoli Song, Shuhui Wang, Qingming Huang, Qi Tian | In this paper, we build our work based on Gaussian process latent variable model (GPLVM) to learn the non-linear non-parametric mapping functions and transform heterogeneous data into a shared latent space. |
454 | Learning Ensembles of Potential Functions for Structured Prediction With Latent Variables | Hossein Hajimirsadeghi, Greg Mori | This paper presents HCRF-Boost, a novel and general framework for learning HCRFs in functional space. |
455 | Simultaneous Deep Transfer Across Domains and Tasks | Eric Tzeng, Judy Hoffman, Trevor Darrell, Kate Saenko | We propose a new CNN architecture to exploit unlabeled and sparsely labeled target domain data. |
456 | Low Dimensional Explicit Feature Maps | Ondrej Chum | We propose a novel method of data independent construction of low dimensional feature maps. |
457 | Unsupervised Learning of Spatiotemporally Coherent Metrics | Ross Goroshin, Joan Bruna, Jonathan Tompson, David Eigen, Yann LeCun | In this work we study unsupervised feature learning with convolutional networks in the context of temporally coherent unlabeled data. |
458 | Multi-Label Cross-Modal Retrieval | Viresh Ranjan, Nikhil Rasiwasia, C. V. Jawahar | In this work, we address the problem of cross-modal retrieval in presence of multi-label annotations. |
459 | Improving Ferns Ensembles by Sparsifying and Quantising Posterior Probabilities | Antonio L. Rodriguez, Vitor Sequeira | We introduce a two-fold contribution that produces large reductions in their memory consumption. |
460 | Beyond Gauss: Image-Set Matching on the Riemannian Manifold of PDFs | Mehrtash Harandi, Mathieu Salzmann, Mahsa Baktashmotlagh | Here, we propose to go beyond these representations and model image-sets as probability distribution functions (PDFs) using kernel density estimators. |
461 | Unsupervised Domain Adaptation With Imbalanced Cross-Domain Data | Tzu Ming Harry Hsu, Wei Yu Chen, Cheng-An Hou, Yao-Hung Hubert Tsai, Yi-Ren Yeh, Yu-Chiang Frank Wang | To address the aforementioned settings of imbalanced cross-domain data, we propose Closest Common Space Learning (CCSL) for associating such data with the capability of preserving label and structural information within and across domains. |
462 | Secrets of Matrix Factorization: Approximations, Numerics, Manifold Optimization and Random Restarts | Je Hyeong Hong, Andrew Fitzgibbon | This paper provides a unified derivation of a number of recent approaches, so that similarities and differences are easily observed. |
463 | Geometry-Aware Deep Transform | Jiaji Huang, Qiang Qiu, Robert Calderbank, Guillermo Sapiro | In this paper, we propose a novel deep learning objective formulation that unifies both the classification and metric learning criteria. |
464 | Learning Binary Codes for Maximum Inner Product Search | Fumin Shen, Wei Liu, Shaoting Zhang, Yang Yang, Heng Tao Shen | In this paper, we investigate learning binary codes to exclusively handle the MIPS problem. |
465 | ML-MG: Multi-Label Learning With Missing Labels Using a Mixed Graph | Baoyuan Wu, Siwei Lyu, Bernard Ghanem | To handle missing labels, we propose a unified model of label dependencies by constructing a mixed graph, which jointly incorporates (i) instance-level similarity and class co-occurrence as undirected edges and (ii) semantic label hierarchy as directed edges. |
466 | Zero-Shot Learning via Semantic Similarity Embedding | Ziming Zhang, Venkatesh Saligrama | In this paper we consider a version of the zero-shot learning problem where seen class source and target domain data are provided. |
467 | Bayesian Model Adaptation for Crowd Counts | Bo Liu, Nuno Vasconcelos | A solution based on Bayesian model adaptation of Gaussian processes is proposed. |
468 | An NMF Perspective on Binary Hashing | Lopamudra Mukherjee, Sathya N. Ravi, Vamsi K. Ithapu, Tyler Holmes, Vikas Singh | We give a probabilistic analysis of our initialization scheme and present a range of experiments to show that the method is simple to implement and competes favorably with available methods (both for optimization and generalization). |
469 | Multi-View Domain Generalization for Visual Recognition | Li Niu, Wen Li, Dong Xu | In this paper, we propose a new multi-view domain generalization (MVDG) approach for visual recognition, in which we aim to use the source domain samples with multiple types of features (i.e., multi-view features) to learn robust classifiers that can generalize well to any unseen target domain. |
470 | Infinite Feature Selection | Giorgio Roffo, Simone Melzi, Marco Cristani | In this paper, we propose a feature selection method exploiting the convergence properties of power series of matrices, and introducing the concept of infinite feature selection (Inf-FS). |
471 | Semi-Supervised Zero-Shot Classification With Label Representation Learning | Xin Li, Yuhong Guo, Dale Schuurmans | In this paper, we propose a novel zero-shot classification approach that automatically learns label embeddings from the input data in a semi-supervised large-margin learning framework. |
472 | A Supervised Low-Rank Method for Learning Invariant Subspaces | Farzad Siyahjani, Ranya Almohsen, Sinan Sabri, Gianfranco Doretto | We introduce the invariant components, a discriminative representation invariant to nuisance factors, because it spans subspaces orthogonal to the space where nuisance factors are defined. |
473 | Recursive Frechet Mean Computation on the Grassmannian and its Applications to Computer Vision | Rudrasis Chakraborty, Baba C. Vemuri | In this paper, we propose one such computationally efficient algorithm called the it Grassmann inductive Frechet mean estimator (GiFME). |
474 | Multi-View Subspace Clustering | Hongchang Gao, Feiping Nie, Xuelong Li, Heng Huang | In this paper, we propose a novel multi-view subspace clustering method. |
475 | Predicting Deep Zero-Shot Convolutional Neural Networks Using Textual Descriptions | Jimmy Lei Ba, Kevin Swersky, Sanja Fidler, Ruslan salakhutdinov | We present a new model that can classify unseen categories from their textual description. |
476 | Structured Feature Selection | Tian Gao, Ziheng Wang, Qiang Ji | 2) We propose a method for structured feature selection to handle hierarchical features and show the proposed method can lead to big performance gain in facial expression and action unit (AU) recognition tasks. |
477 | Conditional High-Order Boltzmann Machine: A Supervised Learning Model for Relation Learning | Yan Huang, Wei Wang, Liang Wang | In this paper, we explore supervised learning algorithms and propose a new model named Conditional High-order Boltzmann Machine (CHBM), which can be directly used as a bilinear classifier to assign similarity scores for pairwise images. |
478 | Learning Image and User Features for Recommendation in Social Networks | Xue Geng, Hanwang Zhang, Jingwen Bian, Tat-Seng Chua | To address the challenges, we propose a novel deep model which learns the unified feature representations for both users and images. |
479 | Dual-Feature Warping-Based Motion Model Estimation | Shiwei Li, Lu Yuan, Jian Sun, Long Quan | In this paper we propose a simple and effective approach by considering both keypoint and line segment correspondences as data-term. |
480 | An Adaptive Data Representation for Robust Point-Set Registration and Merging | Dylan Campbell, Lars Petersson | This paper presents a framework for rigid point-set registration and merging using a robust continuous data representation. |
481 | Local Subspace Collaborative Tracking | Lin Ma, Xiaoqin Zhang, Weiming Hu, Junliang Xing, Jiwen Lu, Jie Zhou | To address this, this paper presents a local subspace collaborative tracking method for robust visual tracking, where multiple linear and nonlinear subspaces are learned to better model the nonlinear relationship of object appearances. |
482 | Learning Spatially Regularized Correlation Filters for Visual Tracking | Martin Danelljan, Gustav Hager, Fahad Shahbaz Khan, Michael Felsberg | We propose Spatially Regularized Discriminative Correlation Filters (SRDCF) for tracking. |
483 | SpeDo: 6 DOF Ego-Motion Sensor Using Speckle Defocus Imaging | Kensei Jo, Mohit Gupta, Shree K. Nayar | We present a novel ego-motion sensor called SpeDo that addresses these fundamental limitations. |
484 | Unsupervised Trajectory Clustering via Adaptive Multi-Kernel-Based Shrinkage | Hongteng Xu, Yang Zhou, Weiyao Lin, Hongyuan Zha | This paper proposes a shrinkage-based framework for unsupervised trajectory clustering. |
485 | TRIC-track: Tracking by Regression With Incrementally Learned Cascades | Xiaomeng Wang, Michel Valstar, Brais Martinez, Muhammad Haris Khan, Tony Pridmore | This paper proposes a novel approach to part-based tracking by replacing local matching of an appearance model by direct prediction of the displacement between local image patches and part locations. |
486 | Recurrent Network Models for Human Dynamics | Katerina Fragkiadaki, Sergey Levine, Panna Felsen, Jitendra Malik | We propose the Encoder-Recurrent-Decoder (ERD) model for recognition and prediction of human body pose in videos and motion capture. |
487 | Contour Flow: Middle-Level Motion Estimation by Combining Motion Segmentation and Contour Alignment | Huijun Di, Qingxuan Shi, Feng Lv, Ming Qin, Yao Lu | Our goal is to estimate contour flow (the contour pairs with consistent point correspondence) from inconsistent contours extracted independently in two video frames. |
488 | FollowMe: Efficient Online Min-Cost Flow Tracking With Bounded Memory and Computation | Philip Lenz, Andreas Geiger, Raquel Urtasun | In this paper, we address each of these issues, resulting in a computationally and memory-bounded solution. |
489 | Learning to Divide and Conquer for Online Multi-Target Tracking | Francesco Solera, Simone Calderara, Rita Cucchiara | In this paper we claim that the ambiguities in tracking could be solved by a selective use of the features, by working with more reliable features if possible and exploiting a deeper representation of the target only if necessary. |
490 | Minimizing Human Effort in Interactive Tracking by Incremental Learning of Model Parameters | Arridhana Ciptadi, James M. Rehg | We address the problem of minimizing human effort in interactive tracking by learning sequence-specific model parameters. |
491 | A Novel Representation of Parts for Accurate 3D Object Detection and Tracking in Monocular Images | Alberto Crivellaro, Mahdi Rad, Yannick Verdie, Kwang Moo Yi, Pascal Fua, Vincent Lepetit | We present a method that estimates in real-time and under challenging conditions the 3D pose of a known object. |
492 | Linearization to Nonlinear Learning for Visual Tracking | Bo Ma, Hongwei Hu, Jianbing Shen, Yuping Zhang, Fatih Porikli | Building on the theory of globally linear approximations to nonlinear functions, we introduce an elegant method that jointly learns a nonlinear classifier and a visual dictionary for tracking objects in a semi-supervised sparse coding fashion. |
493 | Self-Occlusions and Disocclusions in Causal Video Object Segmentation | Yanchao Yang, Ganesh Sundaramoorthi, Stefano Soatto | We propose a method to detect disocclusion in video sequences of three-dimensional scenes and to partition the disoccluded regions into objects, defined by coherent deformation corresponding to surfaces in the scene. |
494 | Large Displacement 3D Scene Flow With Occlusion Reasoning | Andrei Zanfir, Cristian Sminchisescu | In this paper we propose a novel coarse to fine correspondence-based scene flow approach to account for the effects of large displacements and to model occlusion, based on explicit geometric reasoning. |
495 | Co-Interest Person Detection From Multiple Wearable Camera Videos | Yuewei Lin, Kareem Abdelfatah, Youjie Zhou, Xiaochuan Fan, Hongkai Yu, Hui Qian, Song Wang | In this paper, we tackle a new problem of locating the co-interest person (CIP), i.e., the one who draws attention from most camera wearers, from temporally synchronized videos taken by multiple wearable cameras. We collect three sets of wearable-camera videos for testing the proposed algorithm. |
496 | Sparse Dynamic 3D Reconstruction From Unsynchronized Videos | Enliang Zheng, Dinghuang Ji, Enrique Dunn, Jan-Michael Frahm | Our proposed compressed sensing framework poses the estimation of 3D structure as the problem of dictionary learning. |
497 | Category-Blind Human Action Recognition: A Practical Recognition System | Wenbo Li, Longyin Wen, Mooi Choo Chuah, Siwei Lyu | In this paper, we propose the category-blind human recognition method (CHARM) which can recognize a human action without making assumptions of the action category. |
498 | Temporal Subspace Clustering for Human Motion Segmentation | Sheng Li, Kang Li, Yun Fu | We propose a novel temporal subspace clustering (TSC) approach in this paper. |
499 | Weakly-Supervised Alignment of Video With Text | Piotr Bojanowski, Remi Lajugie, Edouard Grave, Francis Bach, Ivan Laptev, Jean Ponce, Cordelia Schmid | We propose in this paper a method for aligning the two modalities, i.e., automatically providing a time (frame) stamp for every sentence. |
500 | Learning Temporal Embeddings for Complex Video Analysis | Vignesh Ramanathan, Kevin Tang, Greg Mori, Li Fei-Fei | In this paper, we propose to learn temporal embeddings of video frames for complex video analysis. |
501 | Unsupervised Semantic Parsing of Video Collections | Ozan Sener, Amir R. Zamir, Silvio Savarese, Ashutosh Saxena | In this paper, we propose a method for parsing a video into such semantic steps in an unsupervised way. |
502 | Learning Spatiotemporal Features With 3D Convolutional Networks | Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, Manohar Paluri | We propose a simple, yet effective approach for spatiotemporal feature learning using deep 3-dimensional convolutional networks (3D ConvNets) trained on a large scale supervised video dataset. |
503 | Temporal Perception and Prediction in Ego-Centric Video | Yipin Zhou, Tamara L. Berg | In this paper we explore two simple tasks related to temporal prediction in egocentric videos of everyday activities. |
504 | Describing Videos by Exploiting Temporal Structure | Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, Hugo Larochelle, Aaron Courville | In this context, we propose an approach that successfully takes into account both the local and global temporal structure of videos to produce descriptions. |
505 | Person Re-Identification With Discriminatively Trained Viewpoint Invariant Dictionaries | Srikrishna Karanam, Yang Li, Richard J. Radke | This paper introduces a new approach to address the person re-identification problem in cameras with non-overlapping fields of view. |
506 | Storyline Representation of Egocentric Videos With an Applications to Story-Based Search | Bo Xiong, Gunhee Kim, Leonid Sigal | To address this, we propose a storyline representation that expresses an egocentric video as a set of jointly inferred, through MRF inference, story elements comprising of actors, locations, supporting objects and events, depicted on a timeline. |
507 | Sequence to Sequence – Video to Text | Subhashini Venugopalan, Marcus Rohrbach, Jeffrey Donahue, Raymond Mooney, Trevor Darrell, Kate Saenko | To approach this problem we propose a novel end-to-end sequence-to-sequence model to generate captions for videos. |
508 | Context Aware Active Learning of Activity Recognition Models | Mahmudul Hasan, Amit K. Roy-Chowdhury | In contrast, we formulate a continuous learning framework for context aware activity recognition from unlabeled video data which has two distinct advantages over most existing methods. |
509 | Action Recognition by Hierarchical Mid-Level Action Elements | Tian Lan, Yuke Zhu, Amir Roshan Zamir, Silvio Savarese | We introduce an unsupervised method to generate this representation from videos. |
510 | Selecting Relevant Web Trained Concepts for Automated Event Retrieval | Bharat Singh, Xintong Han, Zhe Wu, Vlad I. Morariu, Larry S. Davis | We propose an event retrieval algorithm that constructs pairs of automatically discovered concepts and then prunes those concepts that are unlikely to be helpful for retrieval. |
511 | Beyond Covariance: Feature Representation With Nonlinear Kernel Matrices | Lei Wang, Jianjia Zhang, Luping Zhou, Chang Tang, Wanqing Li | It proposes an open framework to use the kernel matrix over feature dimensions as a generic representation and discusses its properties and advantages. |
512 | Multiresolution Hierarchy Co-Clustering for Semantic Segmentation in Sequences With Small Variations | David Varas, Monica Alfaro, Ferran Marques | This paper presents a co-clustering technique that, given a collection of images and their hierarchies, clusters nodes from these hierarchies to obtain a coherent multiresolution representation of the image collection. |
513 | Objects2action: Classifying and Localizing Actions Without Any Video Example | Mihir Jain, Jan C. van Gemert, Thomas Mensink, Cees G. M. Snoek | The goal of this paper is to recognize actions in video without the need for examples. |
514 | Human Action Recognition Using Factorized Spatio-Temporal Convolutional Networks | Lin Sun, Kui Jia, Dit-Yan Yeung, Bertram E. Shi | This has triggered us to investigate in this paper a new deep architecture which can handle 3D signals more effectively. |
515 | Bayesian Non-Parametric Inference for Manifold Based MoCap Representation | Fabrizio Natola, Valsamis Ntouskos, Marta Sanzari, Fiora Pirri | We propose a novel approach to human action recognition, with motion capture data (MoCap), based on grouping sub-body parts. |
516 | Semantic Video Entity Linking Based on Visual Content and Metadata | Yuncheng Li, Xitong Yang, Jiebo Luo | In this paper, we propose to exploit video visual content to improve video entity linking. |
517 | Love Thy Neighbors: Image Annotation by Exploiting Image Metadata | Justin Johnson, Lamberto Ballan, Li Fei-Fei | We build on this intuition to improve multilabel image annotation. |
518 | Unsupervised Extraction of Video Highlights Via Robust Recurrent Auto-Encoders | Huan Yang, Baoyuan Wang, Stephen Lin, David Wipf, Minyi Guo, Baining Guo | Whereas prior works have approached this problem with heuristic rules or supervised learning, we present an unsupervised learning approach that takes advantage of the abundance of user-edited videos on social media websites such as YouTube. |
519 | Learning Visual Clothing Style With Heterogeneous Dyadic Co-Occurrences | Andreas Veit, Balazs Kovacs, Sean Bell, Julian McAuley, Kavita Bala, Serge Belongie | In this paper, we propose a novel learning framework to help answer these types of questions. |
520 | Text Flow: A Unified Text Detection System in Natural Scene Images | Shangxuan Tian, Yifeng Pan, Chang Huang, Shijian Lu, Kai Yu, Chew Lim Tan | To address these issues, we propose a unified scene text detection system, namely Text Flow, by utilizing the minimum cost (min-cost) flow network model. |
521 | Uncovering Interactions and Interactors: Joint Estimation of Head, Body Orientation and F-Formations From Surveillance Videos | Elisa Ricci, Jagannadan Varadarajan, Ramanathan Subramanian, Samuel Rota Bulo, Narendra Ahuja, Oswald Lanz | We present a novel approach for jointly estimating tar- gets’ head, body orientations and conversational groups called F-formations from a distant social scene (e.g., a cocktail party captured by surveillance cameras). |
522 | Generating Notifications for Missing Actions: Don’t Forget to Turn the Lights Off! | Bilge Soran, Ali Farhadi, Linda Shapiro | In this paper, we propose a solution to the problem of issuing notifications on actions that may be missed. In order to show a proof of concept, we collected a new egocentric dataset, in which people wear a camera while making lattes. |
523 | Partial Person Re-Identification | Wei-Shi Zheng, Xiang Li, Tao Xiang, Shengcai Liao, Jianhuang Lai, Shaogang Gong | To solve this more challenging and realistic re-id problem without the implicit assumption of manual body-parts alignment, we propose a matching framework consisting of 1) a local patch-level matching model based on a novel sparse representation classification formulation with explicit patch ambiguity modelling, and 2) a global part-based matching model providing complementary spatial layout information. |
524 | Shape Interaction Matrix Revisited and Robustified: Efficient Subspace Clustering With Corrupted and Incomplete Data | Pan Ji, Mathieu Salzmann, Hongdong Li | In this paper, we revisit the SIM and reveal its connections to several recent subspace clustering methods. |
525 | Multiple Hypothesis Tracking Revisited | Chanho Kim, Fuxin Li, Arridhana Ciptadi, James M. Rehg | In order to further utilize the strength of MHT in exploiting higher-order information, we introduce a method for training online appearance models for each track hypothesis. |
526 | Learning to Track: Online Multi-Object Tracking by Decision Making | Yu Xiang, Alexandre Alahi, Silvio Savarese | In this work, we formulate the online MOT problem as decision making in Markov Decision Processes (MDPs), where the lifetime of an object is modeled with a MDP. |