Most Influential ArXiv (Computer Vision and Pattern Recognition) Papers (2024-10)
The field of Computer Vision and Pattern Recognition in arXiv covers image processing, computer vision, pattern recognition, and scene understanding. Roughly it includes material in ACM Subject Classes I.2.10, I.4, and I.5. Paper Digest Team analyzes all papers published in this field in the past years, and presents up to 30 most influential papers for each year. This ranking list is automatically constructed based upon citations from both research papers and granted patents, and will be frequently updated to reflect the most recent changes. To find the latest version of this list or the most influential papers from other conferences/journals, please visit Best Paper Digest page. Note: the most influential papers may or may not include the papers that won the best paper awards. (Version: 2024-10).
This list is created by the Paper Digest Team. Experience the cutting-edge capabilities of Paper Digest, an innovative AI-powered research platform that empowers you to write, review, get answers and more.
Paper Digest Team
New York City, New York, 10017
team@paperdigest.org
TABLE 1: Most Influential ArXiv (Computer Vision and Pattern Recognition) Papers (2024-10)
Year | Rank | Paper | Author(s) |
---|---|---|---|
2024 | 1 | Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show that the reliance on self-attention for visual representation learning is not necessary and propose a new generic vision backbone with bidirectional Mamba blocks (Vim), which marks the image sequences with position embeddings and compresses the visual representation with bidirectional state space models. |
LIANGHUI ZHU et. al. |
2024 | 2 | YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We proposed the concept of programmable gradient information (PGI) to cope with the various changes required by deep networks to achieve multiple objectives. |
Chien-Yao Wang; I-Hau Yeh; Hong-Yuan Mark Liao; |
2024 | 3 | VMamba: Visual State Space Model IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we transplant Mamba, a state-space language model, into VMamba, a vision backbone that works in linear time complexity. |
YUE LIU et. al. |
2024 | 4 | Depth Anything: Unleashing The Power of Large-Scale Unlabeled Data IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work presents Depth Anything, a highly practical solution for robust monocular depth estimation. |
LIHE YANG et. al. |
2024 | 5 | Scaling Rectified Flow Transformers for High-Resolution Image Synthesis IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite its better theoretical properties and conceptual simplicity, it is not yet decisively established as standard practice. In this work, we improve existing noise sampling techniques for training rectified flow models by biasing them towards perceptually relevant scales. |
PATRICK ESSER et. al. |
2024 | 6 | A Perspective Analysis of Handwritten Signature Technology IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: After several years of haphazard growth of this research area, it is time to assess its current developments for their applicability in order to draw a structured way forward. This perspective reports a systematic review of the last 10 years of the literature on handwritten signatures with respect to the new scenario, focusing on the most promising domains of research and trying to elicit possible future research directions in this subject. |
MOISES DIAZ et. al. |
2024 | 7 | Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Grounded SAM, which uses Grounding DINO as an open-set object detector to combine with the segment anything model (SAM). |
TIANHE REN et. al. |
2024 | 8 | How Far Are We to GPT-4V? Closing The Gap to Commercial Multimodal Models with Open-Source Suites IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this report, we introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. |
ZHE CHEN et. al. |
2024 | 9 | LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Large Multi-View Gaussian Model (LGM), a novel framework designed to generate high-resolution 3D models from text prompts or single-view images. |
JIAXIANG TANG et. al. |
2024 | 10 | InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce InternLM-XComposer2, a cutting-edge vision-language model excelling in free-form text-image composition and comprehension. |
XIAOYI DONG et. al. |
2024 | 11 | 2D Gaussian Splatting for Geometrically Accurate Radiance Fields IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present 2D Gaussian Splatting (2DGS), a novel approach to model and reconstruct geometrically accurate radiance fields from multi-view images. |
Binbin Huang; Zehao Yu; Anpei Chen; Andreas Geiger; Shenghua Gao; |
2024 | 12 | Eyes Wide Shut? Exploring The Visual Shortcomings of Multimodal LLMs IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We further evaluate various CLIP-based vision-and-language models and found a notable correlation between visual patterns that challenge CLIP models and those problematic for multimodal LLMs. As an initial effort to address these issues, we propose a Mixture of Features (MoF) approach, demonstrating that integrating vision self-supervised learning features with MLLMs can significantly enhance their visual grounding capabilities. |
SHENGBANG TONG et. al. |
2024 | 13 | VM-UNet: Vision Mamba UNet for Medical Image Segmentation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, leveraging state space models, we propose a U-shape architecture model for medical image segmentation, named Vision Mamba UNet (VM-UNet). |
Jiacheng Ruan; Suncheng Xiang; |
2024 | 14 | YOLOv10: Real-Time End-to-End Object Detection IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we aim to further advance the performance-efficiency boundary of YOLOs from both the post-processing and model architecture. |
AO WANG et. al. |
2024 | 15 | Lumiere: A Space-Time Diffusion Model for Video Generation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Lumiere — a text-to-video diffusion model designed for synthesizing videos that portray realistic, diverse and coherent motion — a pivotal challenge in video synthesis. |
OMER BAR-TAL et. al. |
2024 | 16 | Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Based on public technical reports and reverse engineering, this paper presents a comprehensive review of the model’s background, related technologies, applications, remaining challenges, and future directions of text-to-video AI models. |
YIXIN LIU et. al. |
2024 | 17 | MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we discuss building performant Multimodal Large Language Models (MLLMs). |
BRANDON MCKINZIE et. al. |
2024 | 18 | Mini-Gemini: Mining The Potential of Multi-modality Vision Language Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce Mini-Gemini, a simple and effective framework enhancing multi-modality Vision Language Models (VLMs). |
YANWEI LI et. al. |
2024 | 19 | InstantID: Zero-shot Identity-Preserving Generation in Seconds IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Conversely, existing ID embedding-based methods, while requiring only a single forward inference, face challenges: they either necessitate extensive fine-tuning across numerous model parameters, lack compatibility with community pre-trained models, or fail to maintain high face fidelity. Addressing these limitations, we introduce InstantID, a powerful diffusion model-based solution. |
QIXUN WANG et. al. |
2024 | 20 | SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Mamba, as a State Space Model (SSM), recently emerged as a notable manner for long-range dependencies in sequential modeling, excelling in natural language processing filed with its remarkable memory efficiency and computational speed. Inspired by its success, we introduce SegMamba, a novel 3D medical image \textbf{Seg}mentation \textbf{Mamba} model, designed to effectively capture long-range dependencies within whole volume features at every scale. |
Zhaohu Xing; Tian Ye; Yijun Yang; Guang Liu; Lei Zhu; |
2024 | 21 | CDIMC-net: Cognitive Deep Incomplete Multi-view Clustering Network IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: 2) These methods are generally sensitive to noise or outliers since the negative samples are treated equally as the important samples. In this paper, we propose a novel incomplete multi-view clustering network, called Cognitive Deep Incomplete Multi-view Clustering Network (CDIMC-net), to address these issues. |
JIE WEN et. al. |
2024 | 22 | A Survey on 3D Gaussian Splatting IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Through this survey, we aim to provide a valuable resource for both newcomers and seasoned researchers, fostering further exploration and advancement in applicable and explicit radiance field representation. |
Guikun Chen; Wenguan Wang; |
2024 | 23 | Image Processing Based Forest Fire Detection IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A novel approach for forest fire detection using image processing technique is proposed. |
Vipin V; |
2024 | 24 | VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we explore the training scheme of video models extended from Stable Diffusion and investigate the feasibility of leveraging low-quality videos and synthesized high-quality images to obtain a high-quality video model. |
HAOXIN CHEN et. al. |
2024 | 25 | YOLO-World: Real-Time Open-Vocabulary Object Detection IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, their reliance on predefined and trained object categories limits their applicability in open scenarios. Addressing this limitation, we introduce YOLO-World, an innovative approach that enhances YOLO with open-vocabulary detection capabilities through vision-language modeling and pre-training on large-scale datasets. |
TIANHENG CHENG et. al. |
2024 | 26 | VideoMamba: State Space Model for Efficient Video Understanding IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Addressing the dual challenges of local redundancy and global dependencies in video understanding, this work innovatively adapts the Mamba to the video domain. |
KUNCHANG LI et. al. |
2024 | 27 | MoE-LLaVA: Mixture of Experts for Large Vision-Language Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a simple yet effective training strategy MoE-Tuning for LVLMs. |
BIN LIN et. al. |
2024 | 28 | Retrieval-Augmented Generation for AI-Generated Content: A Survey IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we comprehensively review existing efforts that integrate RAG technique into AIGC scenarios. |
PENGHAO ZHAO et. al. |
2024 | 29 | Dynamically Enhanced Static Handwriting Representation for Parkinson’s Disease Detection IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, the discriminating power of dynamically enhanced static images of handwriting is investigated. |
Moises Diaz; Miguel Angel Ferrer; Donato Impedovo; Giuseppe Pirlo; Gennaro Vessio; |
2024 | 30 | Detecting Respiratory Pathologies Using Convolutional Neural Networks and Variational Autoencoders for Unbalancing Data IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The aim of this paper was the detection of pathologies through respiratory sounds. |
María Teresa García-Ordás; José Alberto Benítez-Andrades; Isaías García-Rodríguez; Carmen Benavides; Héctor Alaiz-Moretón; |
2023 | 1 | Segment Anything IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. |
ALEXANDER KIRILLOV et. al. |
2023 | 2 | BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes BLIP-2, a generic and efficient pre-training strategy that bootstraps vision-language pre-training from off-the-shelf frozen pre-trained image encoders and frozen large language models. |
Junnan Li; Dongxu Li; Silvio Savarese; Steven Hoi; |
2023 | 3 | Adding Conditional Control to Text-to-Image Diffusion Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. |
Lvmin Zhang; Anyi Rao; Maneesh Agrawala; |
2023 | 4 | Visual Instruction Tuning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present the first attempt to use language-only GPT-4 to generate multimodal language-image instruction-following data. |
Haotian Liu; Chunyuan Li; Qingyang Wu; Yong Jae Lee; |
2023 | 5 | DINOv2: Learning Robust Visual Features Without Supervision IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In terms of data, we propose an automatic pipeline to build a dedicated, diverse, and curated image dataset instead of uncurated data, as typically done in the self-supervised literature. |
MAXIME OQUAB et. al. |
2023 | 6 | Improved Baselines with Visual Instruction Tuning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this note, we show that the fully-connected vision-language cross-modal connector in LLaVA is surprisingly powerful and data-efficient. |
Haotian Liu; Chunyuan Li; Yuheng Li; Yong Jae Lee; |
2023 | 7 | MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We believe that the enhanced multi-modal generation capabilities of GPT-4 stem from the utilization of sophisticated large language models (LLM). To examine this phenomenon, we present MiniGPT-4, which aligns a frozen visual encoder with a frozen advanced LLM, Vicuna, using one projection layer. |
Deyao Zhu; Jun Chen; Xiaoqian Shen; Xiang Li; Mohamed Elhoseiny; |
2023 | 8 | InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we conduct a systematic and comprehensive study on vision-language instruction tuning based on the pretrained BLIP-2 models. |
WENLIANG DAI et. al. |
2023 | 9 | SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present SDXL, a latent diffusion model for text-to-image synthesis. |
DUSTIN PODELL et. al. |
2023 | 10 | Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present an open-set object detector, called Grounding DINO, by marrying Transformer-based detector DINO with grounded pre-training, which can detect arbitrary objects with human inputs such as category names or referring expressions. |
SHILONG LIU et. al. |
2023 | 11 | ScaleNet: An Unsupervised Representation Learning Method for Limited Information IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A simple and efficient unsupervised representation learning method named ScaleNet based on multi-scale images is proposed in this study to enhance the performance of ConvNets when limited information is available. |
Huili Huang; M. Mahdi Roozbahani; |
2023 | 12 | Zero-1-to-3: Zero-shot One Image to 3D Object IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Zero-1-to-3, a framework for changing the camera viewpoint of an object given just a single RGB image. |
RUOSHI LIU et. al. |
2023 | 13 | Align Your Latents: High-Resolution Video Synthesis with Latent Diffusion Models IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Latent Diffusion Models (LDMs) enable high-quality image synthesis while avoiding excessive compute demands by training a diffusion model in a compressed lower-dimensional latent space. |
ANDREAS BLATTMANN et. al. |
2023 | 14 | T2I-Adapter: Learning Adapters to Dig Out More Controllable Ability for Text-to-Image Diffusion Models IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, relying solely on text prompts cannot fully take advantage of the knowledge learned by the model, especially when flexible and accurate controlling (e.g., color and structure) is needed. In this paper, we aim to “dig out the capabilities that T2I models have implicitly learned, and then explicitly use them to control the generation more granularly. |
CHONG MOU et. al. |
2023 | 15 | LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present LLaMA-Adapter, a lightweight adaption method to efficiently fine-tune LLaMA into an instruction-following model. |
RENRUI ZHANG et. al. |
2023 | 16 | ImageBind: One Embedding Space To Bind Them All IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present ImageBind, an approach to learn a joint embedding across six different modalities – images, text, audio, depth, thermal, and IMU data. |
ROHIT GIRDHAR et. al. |
2023 | 17 | Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We design a series of prompts to inject the visual model information into ChatGPT, considering models of multiple inputs/outputs and models that require visual feedback. |
CHENFEI WU et. al. |
2023 | 18 | Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Argoverse 2 (AV2) – a collection of three datasets for perception and forecasting research in the self-driving domain. |
BENJAMIN WILSON et. al. |
2023 | 19 | LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present LLaMA-Adapter V2, a parameter-efficient visual instruction model. |
PENG GAO et. al. |
2023 | 20 | MMBench: Is Your Multi-modal Model An All-around Player? IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Meanwhile, subjective benchmarks, such as OwlEval, offer comprehensive evaluations of a model’s abilities by incorporating human labor, which is not scalable and may display significant bias. In response to these challenges, we propose MMBench, a bilingual benchmark for assessing the multi-modal capabilities of VLMs. |
YUAN LIU et. al. |
2023 | 21 | ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a fully convolutional masked autoencoder framework and a new Global Response Normalization (GRN) layer that can be added to the ConvNeXt architecture to enhance inter-channel feature competition. |
SANGHYUN WOO et. al. |
2023 | 22 | Otter: A Multi-Modal Model with In-Context Instruction Tuning IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose to introduce instruction tuning into multi-modal models, motivated by the Flamingo model’s upstream interleaved format pretraining dataset. |
BO LI et. al. |
2023 | 23 | The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we analyze the latest model, GPT-4V(ision), to deepen the understanding of LMMs. |
ZHENGYUAN YANG et. al. |
2023 | 24 | Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a new method of Fantasia3D for high-quality text-to-3D content creation. |
Rui Chen; Yongwei Chen; Ningxin Jiao; Kui Jia; |
2023 | 25 | Scaling Vision Transformers to 22 Billion Parameters IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a recipe for highly efficient and stable training of a 22B-parameter ViT (ViT-22B) and perform a wide variety of experiments on the resulting model. |
MOSTAFA DEHGHANI et. al. |
2023 | 26 | Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we identify and evaluate three different stages for successful training of video LDMs: text-to-image pretraining, video pretraining, and high-quality video finetuning. |
ANDREAS BLATTMANN et. al. |
2023 | 27 | MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, it is difficult for these case studies to fully reflect the performance of MLLM, lacking a comprehensive evaluation. In this paper, we fill in this blank, presenting the first comprehensive MLLM Evaluation benchmark MME. It measures both perception and cognition abilities on a total of 14 subtasks. |
CHAOYOU FU et. al. |
2023 | 28 | Shikra: Unleashing Multimodal LLM’s Referential Dialogue Magic IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This natural referential ability in dialogue remains absent in current Multimodal Large Language Models (MLLMs). To fill this gap, this paper proposes an MLLM called Shikra, which can handle spatial coordinate inputs and outputs in natural language. |
KEQIN CHEN et. al. |
2023 | 29 | GLIGEN: Open-Set Grounded Text-to-Image Generation IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the status quo is to use text input alone, which can impede controllability. In this work, we propose GLIGEN, Grounded-Language-to-Image Generation, a novel approach that builds upon and extends the functionality of existing pre-trained text-to-image diffusion models by enabling them to also be conditioned on grounding inputs. |
YUHENG LI et. al. |
2023 | 30 | A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a comprehensive analysis of YOLO’s evolution, examining the innovations and contributions in each iteration from the original YOLO up to YOLOv8, YOLO-NAS, and YOLO with Transformers. |
Juan Terven; Diana Cordova-Esparza; |
2022 | 1 | Hierarchical Text-Conditional Image Generation with CLIP Latents IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Contrastive models like CLIP have been shown to learn robust representations of images that capture both semantics and style. To leverage these representations for image generation, we propose a two-stage model: a prior that generates a CLIP image embedding given a text caption, and a decoder that generates an image conditioned on the image embedding. |
Aditya Ramesh; Prafulla Dhariwal; Alex Nichol; Casey Chu; Mark Chen; |
2022 | 2 | Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding. |
CHITWAN SAHARIA et. al. |
2022 | 3 | YOLOv7: Trainable Bag-of-freebies Sets New State-of-the-art for Real-time Object Detectors IF:8 Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56.8% AP among all known real-time object … |
Chien-Yao Wang; Alexey Bochkovskiy; Hong-Yuan Mark Liao; |
2022 | 4 | A ConvNet for The 2020s IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we reexamine the design spaces and test the limits of what a pure ConvNet can achieve. |
ZHUANG LIU et. al. |
2022 | 5 | Instant Neural Graphics Primitives with A Multiresolution Hash Encoding IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Neural graphics primitives, parameterized by fully connected neural networks, can be costly to train and evaluate. We reduce this cost with a versatile new input encoding that permits the use of a smaller network without sacrificing quality, thus significantly reducing the number of floating point and memory access operations: a small neural network is augmented by a multiresolution hash table of trainable feature vectors whose values are optimized through stochastic gradient descent. |
Thomas Müller; Alex Evans; Christoph Schied; Alexander Keller; |
2022 | 6 | BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose BLIP, a new VLP framework which transfers flexibly to both vision-language understanding and generation tasks. |
Junnan Li; Dongxu Li; Caiming Xiong; Steven Hoi; |
2022 | 7 | Flamingo: A Visual Language Model for Few-Shot Learning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Building models that can be rapidly adapted to novel tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research. We introduce Flamingo, a family of Visual Language Models (VLM) with this ability. |
JEAN-BAPTISTE ALAYRAC et. al. |
2022 | 8 | LAION-5B: An Open Large-scale Dataset for Training Next Generation Image-text Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Until now, no datasets of this size have been made openly available for the broader research community. To address this problem and democratize research on large-scale multi-modal models, we present LAION-5B – a dataset consisting of 5.85 billion CLIP-filtered image-text pairs, of which 2.32B contain English language. |
CHRISTOPH SCHUHMANN et. al. |
2022 | 9 | DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a new approach for personalization of text-to-image diffusion models. |
NATANIEL RUIZ et. al. |
2022 | 10 | DreamFusion: Text-to-3D Using 2D Diffusion IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Adapting this approach to 3D synthesis would require large-scale datasets of labeled 3D data and efficient architectures for denoising 3D data, neither of which currently exist. In this work, we circumvent these limitations by using a pretrained 2D text-to-image diffusion model to perform text-to-3D synthesis. |
Ben Poole; Ajay Jain; Jonathan T. Barron; Ben Mildenhall; |
2022 | 11 | An Image Is Worth One Word: Personalizing Text-to-Image Generation Using Textual Inversion IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In other words, we ask: how can we use language-guided models to turn our cat into a painting, or imagine a new product based on our favorite toy? Here we present a simple approach that allows such creative freedom. |
RINON GAL et. al. |
2022 | 12 | Prompt-to-Prompt Image Editing with Cross Attention Control IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we pursue an intuitive prompt-to-prompt editing framework, where the edits are controlled by text only. |
AMIR HERTZ et. al. |
2022 | 13 | Elucidating The Design Space of Diffusion-Based Generative Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We argue that the theory and practice of diffusion-based generative models are currently unnecessarily convoluted and seek to remedy the situation by presenting a design space that clearly separates the concrete design choices. |
Tero Karras; Miika Aittala; Timo Aila; Samuli Laine; |
2022 | 14 | InstructPix2Pix: Learning to Follow Image Editing Instructions IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a method for editing images from human instructions: given an input image and a written instruction that tells the model what to do, our model follows these instructions to edit the image. |
Tim Brooks; Aleksander Holynski; Alexei A. Efros; |
2022 | 15 | Imagen Video: High Definition Video Generation with Diffusion Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Imagen Video, a text-conditional video generation system based on a cascade of video diffusion models. |
JONATHAN HO et. al. |
2022 | 16 | Visual Prompt Tuning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces Visual Prompt Tuning (VPT) as an efficient and effective alternative to full fine-tuning for large-scale Transformer models in vision. |
MENGLIN JIA et. al. |
2022 | 17 | CoCa: Contrastive Captioners Are Image-Text Foundation Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents Contrastive Captioner (CoCa), a minimalist design to pretrain an image-text encoder-decoder foundation model jointly with contrastive loss and captioning loss, thereby subsuming model capabilities from contrastive approaches like CLIP and generative methods like SimVLM. |
JIAHUI YU et. al. |
2022 | 18 | YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The YOLO community has prospered overwhelmingly to enrich its use in a multitude of hardware platforms and abundant scenarios. In this technical report, we strive to push its limits to the next level, stepping forward with an unwavering mindset for industry application. |
CHUYI LI et. al. |
2022 | 19 | RePaint: Inpainting Using Denoising Diffusion Probabilistic Models IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose RePaint: A Denoising Diffusion Probabilistic Model (DDPM) based inpainting approach that is applicable to even extreme masks. |
ANDREAS LUGMAYR et. al. |
2022 | 20 | Video Diffusion Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To generate long and higher resolution videos we introduce a new conditional sampling technique for spatial and temporal video extension that performs better than previously proposed methods. |
JONATHAN HO et. al. |
2022 | 21 | TensoRF: Tensorial Radiance Fields IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present TensoRF, a novel approach to model and reconstruct radiance fields. |
Anpei Chen; Zexiang Xu; Andreas Geiger; Jingyi Yu; Hao Su; |
2022 | 22 | Make-A-Video: Text-to-Video Generation Without Text-Video Data IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Make-A-Video — an approach for directly translating the tremendous recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V). |
URIEL SINGER et. al. |
2022 | 23 | Conditional Prompt Learning for Vision-Language Models IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the problem, we propose Conditional Context Optimization (CoCoOp), which extends CoOp by further learning a lightweight neural network to generate for each image an input-conditional token (vector). |
Kaiyang Zhou; Jingkang Yang; Chen Change Loy; Ziwei Liu; |
2022 | 24 | DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present DINO (\textbf{D}ETR with \textbf{I}mproved de\textbf{N}oising anch\textbf{O}r boxes), a state-of-the-art end-to-end object detector. |
HAO ZHANG et. al. |
2022 | 25 | BEVFormer: Learning Bird’s-Eye-View Representation from Multi-Camera Images Via Spatiotemporal Transformers IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a new framework termed BEVFormer, which learns unified BEV representations with spatiotemporal transformers to support multiple autonomous driving perception tasks. |
ZHIQI LI et. al. |
2022 | 26 | Scaling Autoregressive Models for Content-Rich Text-to-Image Generation IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present the Pathways Autoregressive Text-to-Image (Parti) model, which generates high-fidelity photorealistic images and supports content-rich synthesis involving complex compositions and world knowledge. |
JIAHUI YU et. al. |
2022 | 27 | Magic3D: High-Resolution Text-to-3D Content Creation IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the method has two inherent limitations: (a) extremely slow optimization of NeRF and (b) low-resolution image space supervision on NeRF, leading to low-quality 3D models with a long processing time. In this paper, we address these limitations by utilizing a two-stage optimization framework. |
CHEN-HSUAN LIN et. al. |
2022 | 28 | Scalable Diffusion Models with Transformers IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We train latent diffusion models of images, replacing the commonly-used U-Net backbone with a transformer that operates on latent patches. |
William Peebles; Saining Xie; |
2022 | 29 | VideoMAE: Masked Autoencoders Are Data-Efficient Learners for Self-Supervised Video Pre-Training IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show that video masked autoencoders (VideoMAE) are data-efficient learners for self-supervised video pre-training (SSVP). |
Zhan Tong; Yibing Song; Jue Wang; Limin Wang; |
2022 | 30 | Imagic: Text-Based Real Image Editing with Diffusion Models IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we demonstrate, for the very first time, the ability to apply complex (e.g., non-rigid) text-guided semantic edits to a single real image. |
BAHJAT KAWAR et. al. |
2021 | 1 | Learning Transferable Visual Models From Natural Language Supervision IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet. |
ALEC RADFORD et. al. |
2021 | 2 | Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision. |
ZE LIU et. al. |
2021 | 3 | High-Resolution Image Synthesis with Latent Diffusion Models IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond. |
Robin Rombach; Andreas Blattmann; Dominik Lorenz; Patrick Esser; Björn Ommer; |
2021 | 4 | Masked Autoencoders Are Scalable Vision Learners IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper shows that masked autoencoders (MAE) are scalable self-supervised learners for computer vision. |
KAIMING HE et. al. |
2021 | 5 | Emerging Properties in Self-Supervised Vision Transformers IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we question if self-supervised learning provides new properties to Vision Transformer (ViT) that stand out compared to convolutional networks (convnets). |
MATHILDE CARON et. al. |
2021 | 6 | Zero-Shot Text-to-Image Generation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We describe a simple approach for this task based on a transformer that autoregressively models the text and image tokens as a single stream of data. |
ADITYA RAMESH et. al. |
2021 | 7 | SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present SegFormer, a simple, efficient yet powerful semantic segmentation framework which unifies Transformers with lightweight multilayer perception (MLP) decoders. |
ENZE XIE et. al. |
2021 | 8 | YOLOX: Exceeding YOLO Series in 2021 IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this report, we present some experienced improvements to YOLO series, forming a new high-performance detector — YOLOX. |
Zheng Ge; Songtao Liu; Feng Wang; Zeming Li; Jian Sun; |
2021 | 9 | Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction Without Convolutions IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Unlike the recently-proposed Transformer model (e.g., ViT) that is specially designed for image classification, we propose Pyramid Vision Transformer~(PVT), which overcomes the difficulties of porting Transformer to various dense prediction tasks. |
WENHAI WANG et. al. |
2021 | 10 | Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we leverage a noisy dataset of over one billion image alt-text pairs, obtained without expensive filtering or post-processing steps in the Conceptual Captions dataset. |
CHAO JIA et. al. |
2021 | 11 | GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We explore diffusion models for the problem of text-conditional image synthesis and compare two different guidance strategies: CLIP guidance and classifier-free guidance. |
ALEX NICHOL et. al. |
2021 | 12 | TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose TransUNet, which merits both Transformers and U-Net, as a strong alternative for medical image segmentation. |
JIENENG CHEN et. al. |
2021 | 13 | BEiT: BERT Pre-Training of Image Transformers IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a self-supervised vision representation model BEiT, which stands for Bidirectional Encoder representation from Image Transformers. |
Hangbo Bao; Li Dong; Songhao Piao; Furu Wei; |
2021 | 14 | Coordinate Attention for Efficient Mobile Network Design IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel attention mechanism for mobile networks by embedding positional information into channel attention, which we call coordinate attention. |
Qibin Hou; Daquan Zhou; Jiashi Feng; |
2021 | 15 | MLP-Mixer: An All-MLP Architecture for Vision IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we show that while convolutions and attention are both sufficient for good performance, neither of them are necessary. |
ILYA TOLSTIKHIN et. al. |
2021 | 16 | EfficientNetV2: Smaller Models and Faster Training IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces EfficientNetV2, a new family of convolutional networks that have faster training speed and better parameter efficiency than previous models. |
Mingxing Tan; Quoc V. Le; |
2021 | 17 | Barlow Twins: Self-Supervised Learning Via Redundancy Reduction IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose an objective function that naturally avoids collapse by measuring the cross-correlation matrix between the outputs of two identical networks fed with distorted versions of a sample, and making it as close to the identity matrix as possible. |
Jure Zbontar; Li Jing; Ishan Misra; Yann LeCun; Stéphane Deny; |
2021 | 18 | Transformers in Vision: A Survey IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This survey aims to provide a comprehensive overview of the Transformer models in the computer vision discipline. |
SALMAN KHAN et. al. |
2021 | 19 | ViViT: A Video Vision Transformer IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present pure-transformer based models for video classification, drawing upon the recent success of such models in image classification. |
ANURAG ARNAB et. al. |
2021 | 20 | Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To overcome such limitations, we propose a new Tokens-To-Token Vision Transformer (T2T-ViT), which incorporates 1) a layer-wise Tokens-to-Token (T2T) transformation to progressively structurize the image to tokens by recursively aggregating neighboring Tokens into one Token (Tokens-to-Token), such that local structure represented by surrounding tokens can be modeled and tokens length can be reduced; 2) an efficient backbone with a deep-narrow structure for vision transformer motivated by CNN architecture design after empirical study. |
LI YUAN et. al. |
2021 | 21 | Is Space-Time Attention All You Need for Video Understanding? IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a convolution-free approach to video classification built exclusively on self-attention over space and time. |
Gedas Bertasius; Heng Wang; Lorenzo Torresani; |
2021 | 22 | Masked-attention Mask Transformer for Universal Image Segmentation IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Masked-attention Mask Transformer (Mask2Former), a new architecture capable of addressing any image segmentation task (panoptic, instance or semantic). |
Bowen Cheng; Ishan Misra; Alexander G. Schwing; Alexander Kirillov; Rohit Girdhar; |
2021 | 23 | CvT: Introducing Convolutions to Vision Transformers IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present in this paper a new architecture, named Convolutional vision Transformer (CvT), that improves Vision Transformer (ViT) in performance and efficiency by introducing convolutions into ViT to yield the best of both designs. |
HAIPING WU et. al. |
2021 | 24 | Align Before Fuse: Vision and Language Representation Learning with Momentum Distillation IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a contrastive loss to ALign the image and text representations BEfore Fusing (ALBEF) them through cross-modal attention, which enables more grounded vision and language representation learning. |
JUNNAN LI et. al. |
2021 | 25 | Learning to Prompt for Vision-Language Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Different from the traditional representation learning that is based mostly on discretized labels, vision-language pre-training aligns images and texts in a common feature space, which allows zero-shot transfer to a downstream task via prompting, i.e., classification weights are synthesized from natural language describing classes of interest. In this work, we show that a major challenge for deploying such models in practice is prompt engineering, which requires domain expertise and is extremely time-consuming — one needs to spend a significant amount of time on words tuning since a slight change in wording could have a huge impact on performance. |
Kaiyang Zhou; Jingkang Yang; Chen Change Loy; Ziwei Liu; |
2021 | 26 | Restormer: Efficient Transformer for High-Resolution Image Restoration IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose an efficient Transformer model by making several key designs in the building blocks (multi-head attention and feed-forward network) such that it can capture long-range pixel interactions, while still remaining applicable to large images. |
SYED WAQAS ZAMIR et. al. |
2021 | 27 | An Empirical Study of Training Self-Supervised Vision Transformers IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper does not describe a novel method. |
Xinlei Chen; Saining Xie; Kaiming He; |
2021 | 28 | Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Compared to NeRF, mip-NeRF reduces average error rates by 17% on the dataset presented with NeRF and by 60% on a challenging multiscale variant of that dataset that we present. |
JONATHAN T. BARRON et. al. |
2021 | 29 | Alias-Free Generative Adversarial Networks IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We observe that despite their hierarchical convolutional nature, the synthesis process of typical generative adversarial networks depends on absolute pixel coordinates in an unhealthy manner. |
TERO KARRAS et. al. |
2021 | 30 | NeuS: Learning Neural Implicit Surfaces By Volume Rendering for Multi-view Reconstruction IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a novel neural surface reconstruction method, called NeuS, for reconstructing objects and scenes with high fidelity from 2D image inputs. |
PENG WANG et. al. |
2020 | 1 | An Image Is Worth 16×16 Words: Transformers for Image Recognition at Scale IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show that this reliance on CNNs is not necessary and a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. |
ALEXEY DOSOVITSKIY et. al. |
2020 | 2 | End-to-End Object Detection With Transformers IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a new method that views object detection as a direct set prediction problem. |
NICOLAS CARION et. al. |
2020 | 3 | YOLOv4: Optimal Speed And Accuracy Of Object Detection IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We assume that such universal features include Weighted-Residual-Connections (WRC), Cross-Stage-Partial-connections (CSP), Cross mini-Batch Normalization (CmBN), Self-adversarial-training (SAT) and Mish-activation. |
Alexey Bochkovskiy; Chien-Yao Wang; Hong-Yuan Mark Liao; |
2020 | 4 | Training Data-efficient Image Transformers & Distillation Through Attention IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we produce a competitive convolution-free transformer by training on Imagenet only. |
HUGO TOUVRON et. al. |
2020 | 5 | Deformable DETR: Deformable Transformers for End-to-End Object Detection IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To mitigate these issues, we proposed Deformable DETR, whose attention modules only attend to a small set of key sampling points around a reference. |
XIZHOU ZHU et. al. |
2020 | 6 | NeRF: Representing Scenes As Neural Radiance Fields For View Synthesis IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views. |
BEN MILDENHALL et. al. |
2020 | 7 | Unsupervised Learning of Visual Features By Contrasting Cluster Assignments IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an online algorithm, SwAV, that takes advantage of contrastive methods without requiring to compute pairwise comparisons. |
MATHILDE CARON et. al. |
2020 | 8 | Exploring Simple Siamese Representation Learning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we report surprising empirical results that simple Siamese networks can learn meaningful representations even using none of the following: (i) negative sample pairs, (ii) large batches, (iii) momentum encoders. |
Xinlei Chen; Kaiming He; |
2020 | 9 | Improved Baselines With Momentum Contrastive Learning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this note, we verify the effectiveness of two of SimCLR’s design improvements by implementing them in the MoCo framework. |
Xinlei Chen; Haoqi Fan; Ross Girshick; Kaiming He; |
2020 | 10 | Rethinking Semantic Segmentation from A Sequence-to-Sequence Perspective with Transformers IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we aim to provide an alternative perspective by treating semantic segmentation as a sequence-to-sequence prediction task. |
SIXIAO ZHENG et. al. |
2020 | 11 | Image Segmentation Using Deep Learning: A Survey IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this survey, we provide a comprehensive review of the literature at the time of this writing, covering a broad spectrum of pioneering works for semantic and instance-level segmentation, including fully convolutional pixel-labeling networks, encoder-decoder architectures, multi-scale and pyramid based approaches, recurrent networks, visual attention models, and generative models in adversarial settings. |
SHERVIN MINAEE et. al. |
2020 | 12 | Taming Transformers for High-Resolution Image Synthesis IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In particular, we present the first results on semantically-guided synthesis of megapixel images with transformers and obtain the state of the art among autoregressive models on class-conditional ImageNet. |
Patrick Esser; Robin Rombach; Björn Ommer; |
2020 | 13 | Implicit Neural Representations With Periodic Activation Functions IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose to leverage periodic activation functions for implicit neural representations and demonstrate that these networks, dubbed sinusoidal representation networks or Sirens, are ideally suited for representing complex natural signals and their derivatives. |
Vincent Sitzmann; Julien N. P. Martel; Alexander W. Bergman; David B. Lindell; Gordon Wetzstein; |
2020 | 14 | RAFT: Recurrent All-Pairs Field Transforms For Optical Flow IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Recurrent All-Pairs Field Transforms (RAFT), a new deep network architecture for optical flow. |
Zachary Teed; Jia Deng; |
2020 | 15 | Fourier Features Let Networks Learn High Frequency Functions In Low Dimensional Domains IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We suggest an approach for selecting problem-specific Fourier features that greatly improves the performance of MLPs for low-dimensional regression tasks relevant to the computer vision and graphics communities. |
MATTHEW TANCIK et. al. |
2020 | 16 | Face2Face: Real-time Face Capture And Reenactment Of RGB Videos IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Face2Face, a novel approach for real-time facial reenactment of a monocular target video sequence (e.g., Youtube video). |
Justus Thies; Michael Zollhöfer; Marc Stamminger; Christian Theobalt; Matthias Nießner; |
2020 | 17 | Oscar: Object-Semantics Aligned Pre-training For Vision-Language Tasks IF:8 Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: Large-scale pre-training methods of learning cross-modal representations on image-text pairs are becoming popular for vision-language tasks. While existing methods simply … |
XIUJUN LI et. al. |
2020 | 18 | Shortcut Learning in Deep Neural Networks IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this perspective we seek to distill how many of deep learning’s problems can be seen as different symptoms of the same underlying problem: shortcut learning. |
ROBERT GEIRHOS et. al. |
2020 | 19 | Training Generative Adversarial Networks With Limited Data IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose an adaptive discriminator augmentation mechanism that significantly stabilizes training in limited data regimes. |
TERO KARRAS et. al. |
2020 | 20 | Point Transformer IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present Point Transformer, a deep neural network that operates directly on unordered and unstructured point sets. |
Nico Engel; Vasileios Belagiannis; Klaus Dietmayer; |
2020 | 21 | Designing Network Design Spaces IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a new network design paradigm. |
Ilija Radosavovic; Raj Prateek Kosaraju; Ross Girshick; Kaiming He; Piotr Dollár; |
2020 | 22 | PixelNeRF: Neural Radiance Fields from One or Few Images IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose pixelNeRF, a learning framework that predicts a continuous neural scene representation conditioned on one or few input images. |
Alex Yu; Vickie Ye; Matthew Tancik; Angjoo Kanazawa; |
2020 | 23 | Pre-Trained Image Processing Transformer IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study the low-level computer vision task (e.g., denoising, super-resolution and deraining) and develop a new pre-trained model, namely, image processing transformer (IPT). |
HANTING CHEN et. al. |
2020 | 24 | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce four new real-world distribution shift datasets consisting of changes in image style, image blurriness, geographic location, camera operation, and more. |
DAN HENDRYCKS et. al. |
2020 | 25 | Deep Learning for Person Re-identification: A Survey and Outlook IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The widely studied closed-world setting is usually applied under various research-oriented assumptions, and has achieved inspiring success using deep learning techniques on a number of datasets. |
MANG YE et. al. |
2020 | 26 | ResNeSt: Split-Attention Networks IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a simple and modular Split-Attention block that enables attention across feature-map groups. |
HANG ZHANG et. al. |
2020 | 27 | Center-based 3D Object Detection and Tracking IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we instead propose to represent, detect, and track 3D objects as points. |
Tianwei Yin; Xingyi Zhou; Philipp Krähenbühl; |
2020 | 28 | PCT: Point Cloud Transformer IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a novel framework named Point Cloud Transformer(PCT) for point cloud learning. |
MENG-HAO GUO et. al. |
2020 | 29 | NeRF in The Wild: Neural Radiance Fields for Unconstrained Photo Collections IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a learning-based method for synthesizing novel views of complex scenes using only unstructured collections of in-the-wild photographs. |
RICARDO MARTIN-BRUALLA et. al. |
2020 | 30 | D-NeRF: Neural Radiance Fields For Dynamic Scenes IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we introduce D-NeRF, a method that extends neural radiance fields to a dynamic domain, allowing to reconstruct and render novel images of objects under rigid and non-rigid motions from a \emph{single} camera moving around the scene. |
Albert Pumarola; Enric Corona; Gerard Pons-Moll; Francesc Moreno-Noguer; |
2019 | 1 | Momentum Contrast For Unsupervised Visual Representation Learning IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Momentum Contrast (MoCo) for unsupervised visual representation learning. |
Kaiming He; Haoqi Fan; Yuxin Wu; Saining Xie; Ross Girshick; |
2019 | 2 | Searching For MobileNetV3 IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present the next generation of MobileNets based on a combination of complementary search techniques as well as a novel architecture design. |
ANDREW HOWARD et. al. |
2019 | 3 | FCOS: Fully Convolutional One-Stage Object Detection IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a fully convolutional one-stage object detector (FCOS) to solve object detection in a per-pixel prediction fashion, analogue to semantic segmentation. |
Zhi Tian; Chunhua Shen; Hao Chen; Tong He; |
2019 | 4 | CutMix: Regularization Strategy To Train Strong Classifiers With Localizable Features IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We therefore propose the CutMix augmentation strategy: patches are cut and pasted among training images where the ground truth labels are also mixed proportionally to the area of the patches. |
SANGDOO YUN et. al. |
2019 | 5 | EfficientDet: Scalable And Efficient Object Detection IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we systematically study neural network architecture design choices for object detection and propose several key optimizations to improve efficiency. |
Mingxing Tan; Ruoming Pang; Quoc V. Le; |
2019 | 6 | Deep High-Resolution Representation Learning For Human Pose Estimation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we are interested in the human pose estimation problem with a focus on learning reliable high-resolution representations. |
Ke Sun; Bin Xiao; Dong Liu; Jingdong Wang; |
2019 | 7 | Generalized Intersection Over Union: A Metric And A Loss For Bounding Box Regression IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we address the weaknesses of $IoU$ by introducing a generalized version as both a new loss and a new metric. |
HAMID REZATOFIGHI et. al. |
2019 | 8 | ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations For Vision-and-Language Tasks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present ViLBERT (short for Vision-and-Language BERT), a model for learning task-agnostic joint representations of image content and natural language. |
Jiasen Lu; Dhruv Batra; Devi Parikh; Stefan Lee; |
2019 | 9 | DeepSDF: Learning Continuous Signed Distance Functions For Shape Representation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce DeepSDF, a learned continuous Signed Distance Function (SDF) representation of a class of shapes that enables high quality shape representation, interpolation and completion from partial and noisy 3D input data. |
Jeong Joon Park; Peter Florence; Julian Straub; Richard Newcombe; Steven Lovegrove; |
2019 | 10 | RandAugment: Practical Automated Data Augmentation With A Reduced Search Space IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we remove both of these obstacles. |
Ekin D. Cubuk; Barret Zoph; Jonathon Shlens; Quoc V. Le; |
2019 | 11 | Deep High-Resolution Representation Learning For Visual Recognition IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show the superiority of the proposed HRNet in a wide range of applications, including human pose estimation, semantic segmentation, and object detection, suggesting that the HRNet is a stronger backbone for computer vision problems. |
JINGDONG WANG et. al. |
2019 | 12 | Objects As Points IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we take a different approach. |
Xingyi Zhou; Dequan Wang; Philipp Krähenbühl; |
2019 | 13 | ECA-Net: Efficient Channel Attention For Deep Convolutional Neural Networks IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To overcome the paradox of performance and complexity trade-off, this paper proposes an Efficient Channel Attention (ECA) module, which only involves a handful of parameters while bringing clear performance gain. |
QILONG WANG et. al. |
2019 | 14 | Distance-IoU Loss: Faster And Better Learning For Bounding Box Regression IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a Distance-IoU (DIoU) loss by incorporating the normalized distance between the predicted box and the target box, which converges much faster in training than IoU and GIoU losses. |
ZHAOHUI ZHENG et. al. |
2019 | 15 | CSPNet: A New Backbone That Can Enhance Learning Capability Of CNN IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Cross Stage Partial Network (CSPNet) to mitigate the problem that previous works require heavy inference computations from the network architecture perspective. |
CHIEN-YAO WANG et. al. |
2019 | 16 | MMDetection: Open MMLab Detection Toolbox And Benchmark IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce the various features of this toolbox. |
KAI CHEN et. al. |
2019 | 17 | Semantic Image Synthesis With Spatially-Adaptive Normalization IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose spatially-adaptive normalization, a simple but effective layer for synthesizing photorealistic images given an input semantic layout. |
Taesung Park; Ming-Yu Liu; Ting-Chun Wang; Jun-Yan Zhu; |
2019 | 18 | CenterNet: Keypoint Triplets For Object Detection IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents an efficient solution which explores the visual patterns within each cropped region with minimal costs. |
KAIWEN DUAN et. al. |
2019 | 19 | Scalability In Perception For Autonomous Driving: Waymo Open Dataset IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In an effort to help align the research community’s contributions with real-world self-driving problems, we introduce a new large scale, high quality, diverse dataset. |
PEI SUN et. al. |
2019 | 20 | CheXpert: A Large Chest Radiograph Dataset With Uncertainty Labels And Expert Comparison IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present CheXpert, a large dataset that contains 224,316 chest radiographs of 65,240 patients. We release the dataset to the public as a standard benchmark to evaluate performance of chest radiograph interpretation models. |
JEREMY IRVIN et. al. |
2019 | 21 | KPConv: Flexible And Deformable Convolution For Point Clouds IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Kernel Point Convolution (KPConv), a new design of point convolution, i.e. that operates on point clouds without any intermediate representation. |
HUGUES THOMAS et. al. |
2019 | 22 | Contrastive Multiview Coding IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study this hypothesis under the framework of multiview contrastive learning, where we learn a representation that aims to maximize mutual information between different views of the same scene but is otherwise compact. |
Yonglong Tian; Dilip Krishnan; Phillip Isola; |
2019 | 23 | GhostNet: More Features From Cheap Operations IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a novel Ghost module to generate more feature maps from cheap operations. |
KAI HAN et. al. |
2019 | 24 | Res2Net: A New Multi-scale Backbone Architecture IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel building block for CNNs, namely Res2Net, by constructing hierarchical residual-like connections within one single residual block. |
SHANG-HUA GAO et. al. |
2019 | 25 | A Survey Of The Recent Architectures Of Deep Convolutional Neural Networks IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This survey thus focuses on the intrinsic taxonomy present in the recently reported deep CNN architectures and, consequently, classifies the recent innovations in CNN architectures into seven different categories. |
Asifullah Khan; Anabia Sohail; Umme Zahoora; Aqsa Saeed Qureshi; |
2019 | 26 | Class-Balanced Loss Based On Effective Number Of Samples IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. |
Yin Cui; Menglin Jia; Tsung-Yi Lin; Yang Song; Serge Belongie; |
2019 | 27 | UNITER: UNiversal Image-TExt Representation Learning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce UNITER, a UNiversal Image-TExt Representation, learned through large-scale pre-training over four image-text datasets (COCO, Visual Genome, Conceptual Captions, and SBU Captions), which can power heterogeneous downstream V+L tasks with joint multimodal embeddings. |
YEN-CHUN CHEN et. al. |
2019 | 28 | Selective Kernel Networks IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a dynamic selection mechanism in CNNs that allows each neuron to adaptively adjust its receptive field size based on multiple scales of input information. |
Xiang Li; Wenhai Wang; Xiaolin Hu; Jian Yang; |
2019 | 29 | VisualBERT: A Simple And Performant Baseline For Vision And Language IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose VisualBERT, a simple and flexible framework for modeling a broad range of vision-and-language tasks. |
Liunian Harold Li; Mark Yatskar; Da Yin; Cho-Jui Hsieh; Kai-Wei Chang; |
2019 | 30 | FaceForensics++: Learning To Detect Manipulated Facial Images IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To standardize the evaluation of detection methods, we propose an automated benchmark for facial manipulation detection. |
ANDREAS RÖSSLER et. al. |
2018 | 1 | YOLOv3: An Incremental Improvement IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present some updates to YOLO! |
Joseph Redmon; Ali Farhadi; |
2018 | 2 | MobileNetV2: Inverted Residuals And Linear Bottlenecks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes. |
Mark Sandler; Andrew Howard; Menglong Zhu; Andrey Zhmoginov; Liang-Chieh Chen; |
2018 | 3 | CBAM: Convolutional Block Attention Module IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Convolutional Block Attention Module (CBAM), a simple yet effective attention module for feed-forward convolutional neural networks. |
Sanghyun Woo; Jongchan Park; Joon-Young Lee; In So Kweon; |
2018 | 4 | Encoder-Decoder With Atrous Separable Convolution For Semantic Image Segmentation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose to combine the advantages from both methods. |
Liang-Chieh Chen; Yukun Zhu; George Papandreou; Florian Schroff; Hartwig Adam; |
2018 | 5 | The Unreasonable Effectiveness Of Deep Features As A Perceptual Metric IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To answer these questions, we introduce a new dataset of human perceptual similarity judgments. |
Richard Zhang; Phillip Isola; Alexei A. Efros; Eli Shechtman; Oliver Wang; |
2018 | 6 | ArcFace: Additive Angular Margin Loss For Deep Face Recognition IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an Additive Angular Margin Loss (ArcFace) to obtain highly discriminative features for face recognition. We release all refined training data, training codes, pre-trained models and training logs, which will help reproduce the results in this paper. |
Jiankang Deng; Jia Guo; Niannan Xue; Stefanos Zafeiriou; |
2018 | 7 | Dynamic Graph CNN For Learning On Point Clouds IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a new neural network module dubbed EdgeConv suitable for CNN-based high-level tasks on point clouds including classification and segmentation. |
YUE WANG et. al. |
2018 | 8 | UNet++: A Nested U-Net Architecture For Medical Image Segmentation IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present UNet++, a new, more powerful architecture for medical image segmentation. |
Zongwei Zhou; Md Mahfuzur Rahman Siddiquee; Nima Tajbakhsh; Jianming Liang; |
2018 | 9 | Path Aggregation Network For Instance Segmentation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Path Aggregation Network (PANet) aiming at boosting information flow in proposal-based instance segmentation framework. |
Shu Liu; Lu Qi; Haifang Qin; Jianping Shi; Jiaya Jia; |
2018 | 10 | Dual Attention Network For Scene Segmentation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we address the scene segmentation task by capturing rich contextual dependencies based on the selfattention mechanism. |
JUN FU et. al. |
2018 | 11 | ShuffleNet V2: Practical Guidelines For Efficient CNN Architecture Design IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Thus, this work proposes to evaluate the direct metric on the target platform, beyond only considering FLOPs. |
Ningning Ma; Xiangyu Zhang; Hai-Tao Zheng; Jian Sun; |
2018 | 12 | OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a realtime approach to detect the 2D pose of multiple people in an image. |
Zhe Cao; Gines Hidalgo; Tomas Simon; Shih-En Wei; Yaser Sheikh; |
2018 | 13 | Attention U-Net: Learning Where To Look For The Pancreas IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel attention gate (AG) model for medical imaging that automatically learns to focus on target structures of varying shapes and sizes. |
OZAN OKTAY et. al. |
2018 | 14 | Image Super-Resolution Using Very Deep Residual Channel Attention Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To solve these problems, we propose the very deep residual channel attention networks (RCAN). |
YULUN ZHANG et. al. |
2018 | 15 | Spatial Temporal Graph Convolutional Networks For Skeleton-Based Action Recognition IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel model of dynamic skeletons called Spatial-Temporal Graph Convolutional Networks (ST-GCN), which moves beyond the limitations of previous methods by automatically learning both the spatial and temporal patterns from data. |
Sijie Yan; Yuanjun Xiong; Dahua Lin; |
2018 | 16 | Object Detection With Deep Learning: A Review IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we provide a review on deep learning based object detection frameworks. |
Zhong-Qiu Zhao; Peng Zheng; Shou-tao Xu; Xindong Wu; |
2018 | 17 | CornerNet: Detecting Objects As Paired Keypoints IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose CornerNet, a new approach to object detection where we detect an object bounding box as a pair of keypoints, the top-left corner and the bottom-right corner, using a single convolution neural network. |
Hei Law; Jia Deng; |
2018 | 18 | Group Normalization IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present Group Normalization (GN) as a simple alternative to BN. |
Yuxin Wu; Kaiming He; |
2018 | 19 | ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In particular, we introduce the Residual-in-Residual Dense Block (RRDB) without batch normalization as the basic network building unit. |
XINTAO WANG et. al. |
2018 | 20 | Unsupervised Representation Learning By Predicting Image Rotations IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In our work we propose to learn image features by training ConvNets to recognize the 2d rotation that is applied to the image that it gets as input. |
Spyros Gidaris; Praveer Singh; Nikos Komodakis; |
2018 | 21 | Residual Dense Network For Image Super-Resolution IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel residual dense network (RDN) to address this problem in image SR. |
Yulun Zhang; Yapeng Tian; Yu Kong; Bineng Zhong; Yun Fu; |
2018 | 22 | SlowFast Networks For Video Recognition IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present SlowFast networks for video recognition. |
Christoph Feichtenhofer; Haoqi Fan; Jitendra Malik; Kaiming He; |
2018 | 23 | MnasNet: Platform-Aware Neural Architecture Search For Mobile IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an automated mobile neural architecture search (MNAS) approach, which explicitly incorporate model latency into the main objective so that the search can identify a model that achieves a good trade-off between accuracy and latency. |
MINGXING TAN et. al. |
2018 | 24 | Occupancy Networks: Learning 3D Reconstruction In Function Space IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Occupancy Networks, a new representation for learning-based 3D reconstruction methods. |
Lars Mescheder; Michael Oechsle; Michael Niemeyer; Sebastian Nowozin; Andreas Geiger; |
2018 | 25 | ImageNet-trained CNNs Are Biased Towards Texture; Increasing Shape Bias Improves Accuracy and Robustness IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show that ImageNet-trained CNNs are strongly biased towards recognising textures rather than shapes, which is in stark contrast to human behavioural evidence and reveals fundamentally different classification strategies. |
ROBERT GEIRHOS et. al. |
2018 | 26 | Multimodal Unsupervised Image-to-Image Translation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address this limitation, we propose a Multimodal Unsupervised Image-to-image Translation (MUNIT) framework. |
Xun Huang; Ming-Yu Liu; Serge Belongie; Jan Kautz; |
2018 | 27 | CosFace: Large Margin Cosine Loss For Deep Face Recognition IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel loss function, namely large margin cosine loss (LMCL), to realize this idea from a different perspective. |
HAO WANG et. al. |
2018 | 28 | The HAM10000 Dataset, A Large Collection Of Multi-source Dermatoscopic Images Of Common Pigmented Skin Lesions IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We tackle this problem by releasing the HAM10000 (Human Against Machine with 10000 training images) dataset. |
Philipp Tschandl; Cliff Rosendahl; Harald Kittler; |
2018 | 29 | CCNet: Criss-Cross Attention For Semantic Segmentation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a Criss-Cross Network (CCNet) for obtaining full-image contextual information in a very effective and efficient way. |
ZILONG HUANG et. al. |
2018 | 30 | Deep Learning For Generic Object Detection: A Survey IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given this period of rapid evolution, the goal of this paper is to provide a comprehensive survey of the recent achievements in this field brought about by deep learning techniques. |
LI LIU et. al. |
2017 | 1 | Mask R-CNN IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a conceptually simple, flexible, and general framework for object instance segmentation. |
Kaiming He; Georgia Gkioxari; Piotr Dollár; Ross Girshick; |
2017 | 2 | Squeeze-and-Excitation Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we focus instead on the channel relationship and propose a novel architectural unit, which we term the Squeeze-and-Excitation (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels. |
Jie Hu; Li Shen; Samuel Albanie; Gang Sun; Enhua Wu; |
2017 | 3 | Focal Loss For Dense Object Detection IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate why this is the case. |
Tsung-Yi Lin; Priya Goyal; Ross Girshick; Kaiming He; Piotr Dollár; |
2017 | 4 | MobileNets: Efficient Convolutional Neural Networks For Mobile Vision Applications IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a class of efficient models called MobileNets for mobile and embedded vision applications. |
ANDREW G. HOWARD et. al. |
2017 | 5 | A Survey On Deep Learning In Medical Image Analysis IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper reviews the major deep learning concepts pertinent to medical image analysis and summarizes over 300 contributions to the field, most of which appeared in the last year. |
GEERT LITJENS et. al. |
2017 | 6 | PointNet++: Deep Hierarchical Feature Learning On Point Sets In A Metric Space IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a hierarchical neural network that applies PointNet recursively on a nested partitioning of the input point set. |
Charles R. Qi; Li Yi; Hao Su; Leonidas J. Guibas; |
2017 | 7 | Non-local Neural Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present non-local operations as a generic family of building blocks for capturing long-range dependencies. |
Xiaolong Wang; Ross Girshick; Abhinav Gupta; Kaiming He; |
2017 | 8 | Rethinking Atrous Convolution For Semantic Image Segmentation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we revisit atrous convolution, a powerful tool to explicitly adjust filter’s field-of-view as well as control the resolution of feature responses computed by Deep Convolutional Neural Networks, in the application of semantic image segmentation. |
Liang-Chieh Chen; George Papandreou; Florian Schroff; Hartwig Adam; |
2017 | 9 | Quo Vadis, Action Recognition? A New Model And The Kinetics Dataset IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We provide an analysis on how current architectures fare on the task of action classification on this dataset and how much performance improves on the smaller benchmark datasets after pre-training on Kinetics. |
Joao Carreira; Andrew Zisserman; |
2017 | 10 | ShuffleNet: An Extremely Efficient Convolutional Neural Network For Mobile Devices IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce an extremely computation-efficient CNN architecture named ShuffleNet, which is designed specially for mobile devices with very limited computing power (e.g., 10-150 MFLOPs). |
Xiangyu Zhang; Xinyu Zhou; Mengxiao Lin; Jian Sun; |
2017 | 11 | Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present an approach for learning to translate an image from a source domain $X$ to a target domain $Y$ in the absence of paired examples. |
Jun-Yan Zhu; Taesung Park; Phillip Isola; Alexei A. Efros; |
2017 | 12 | Enhanced Deep Residual Networks For Single Image Super-Resolution IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we develop an enhanced deep super-resolution network (EDSR) with performance exceeding those of current state-of-the-art SR methods. |
Bee Lim; Sanghyun Son; Heewon Kim; Seungjun Nah; Kyoung Mu Lee; |
2017 | 13 | Learning Transferable Architectures For Scalable Image Recognition IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study a method to learn the model architectures directly on the dataset of interest. |
Barret Zoph; Vijay Vasudevan; Jonathon Shlens; Quoc V. Le; |
2017 | 14 | Adversarial Discriminative Domain Adaptation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: They also can improve recognition despite the presence of domain shift or dataset bias: several adversarial approaches to unsupervised domain adaptation have recently been introduced, which reduce the difference between the training and test domain distributions and thus improve generalization performance. |
Eric Tzeng; Judy Hoffman; Kate Saenko; Trevor Darrell; |
2017 | 15 | Dynamic Routing Between Capsules IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We use the length of the activity vector to represent the probability that the entity exists and its orientation to represent the instantiation parameters. |
Sara Sabour; Nicholas Frosst; Geoffrey E Hinton; |
2017 | 16 | Cascade R-CNN: Delving Into High Quality Object Detection IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: In object detection, an intersection over union (IoU) threshold is required to define positives and negatives. An object detector, trained with low IoU threshold, e.g. 0.5, … |
Zhaowei Cai; Nuno Vasconcelos; |
2017 | 17 | Bottom-Up And Top-Down Attention For Image Captioning And Visual Question Answering IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a combined bottom-up and top-down attention mechanism that enables attention to be calculated at the level of objects and other salient image regions. |
PETER ANDERSON et. al. |
2017 | 18 | Arbitrary Style Transfer In Real-time With Adaptive Instance Normalization IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a simple yet effective approach that for the first time enables arbitrary style transfer in real-time. |
Xun Huang; Serge Belongie; |
2017 | 19 | Learning To Compare: Relation Network For Few-Shot Learning IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a conceptually simple, flexible, and general framework for few-shot learning, where a classifier must learn to recognise new classes given only few examples from each. |
FLOOD SUNG et. al. |
2017 | 20 | High-Resolution Image Synthesis And Semantic Manipulation With Conditional GANs IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a new method for synthesizing high-resolution photo-realistic images from semantic label maps using conditional generative adversarial networks (conditional GANs). |
TING-CHUN WANG et. al. |
2017 | 21 | Learning Important Features Through Propagating Activation Differences IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here we present DeepLIFT (Deep Learning Important FeaTures), a method for decomposing the output prediction of a neural network on a specific input by backpropagating the contributions of all neurons in the network to every feature of the input. |
Avanti Shrikumar; Peyton Greenside; Anshul Kundaje; |
2017 | 22 | The Kinetics Human Action Video Dataset IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We describe the DeepMind Kinetics human action video dataset. |
WILL KAY et. al. |
2017 | 23 | Accurate, Large Minibatch SGD: Training ImageNet In 1 Hour IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we empirically show that on the ImageNet dataset large minibatches cause optimization difficulties, but when these are addressed the trained networks exhibit good generalization. |
PRIYA GOYAL et. al. |
2017 | 24 | Improved Regularization Of Convolutional Neural Networks With Cutout IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show that the simple regularization technique of randomly masking out square regions of input during training, which we call cutout, can be used to improve the robustness and overall performance of convolutional neural networks. |
Terrance DeVries; Graham W. Taylor; |
2017 | 25 | ScanNet: Richly-annotated 3D Reconstructions Of Indoor Scenes IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address this issue, we introduce ScanNet, an RGB-D video dataset containing 2.5M views in 1513 scenes annotated with 3D camera poses, surface reconstructions, and semantic segmentations. |
ANGELA DAI et. al. |
2017 | 26 | StarGAN: Unified Generative Adversarial Networks For Multi-Domain Image-to-Image Translation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address this limitation, we propose StarGAN, a novel and scalable approach that can perform image-to-image translations for multiple domains using only a single model. |
YUNJEY CHOI et. al. |
2017 | 27 | VoxelNet: End-to-End Learning For Point Cloud Based 3D Object Detection IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we remove the need of manual feature engineering for 3D point clouds and propose VoxelNet, a generic 3D detection network that unifies feature extraction and bounding box prediction into a single stage, end-to-end trainable deep network. |
Yin Zhou; Oncel Tuzel; |
2017 | 28 | Random Erasing Data Augmentation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Random Erasing, a new data augmentation method for training the convolutional neural network (CNN). |
Zhun Zhong; Liang Zheng; Guoliang Kang; Shaozi Li; Yi Yang; |
2017 | 29 | Billion-scale Similarity Search With GPUs IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a design for k-selection that operates at up to 55% of theoretical peak performance, enabling a nearest neighbor implementation that is 8.5x faster than prior GPU state of the art. |
Jeff Johnson; Matthijs Douze; Hervé Jégou; |
2017 | 30 | Residual Attention Network For Image Classification IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose Residual Attention Network, a convolutional neural network using attention mechanism which can incorporate with state-of-art feed forward network architecture in an end-to-end training fashion. |
FEI WANG et. al. |
2016 | 1 | Densely Connected Convolutional Networks IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we embrace this observation and introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion. |
Gao Huang; Zhuang Liu; Laurens van der Maaten; Kilian Q. Weinberger; |
2016 | 2 | Feature Pyramid Networks For Object Detection IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we exploit the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost. |
TSUNG-YI LIN et. al. |
2016 | 3 | Image-to-Image Translation With Conditional Adversarial Networks IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. |
Phillip Isola; Jun-Yan Zhu; Tinghui Zhou; Alexei A. Efros; |
2016 | 4 | Grad-CAM: Visual Explanations From Deep Networks Via Gradient-based Localization IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a technique for producing visual explanations for decisions from a large class of CNN-based models, making them more transparent. |
RAMPRASAATH R. SELVARAJU et. al. |
2016 | 5 | DeepLab: Semantic Image Segmentation With Deep Convolutional Nets, Atrous Convolution, And Fully Connected CRFs IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work we address the task of semantic image segmentation with Deep Learning and make three main contributions that are experimentally shown to have substantial practical merit. |
Liang-Chieh Chen; George Papandreou; Iasonas Kokkinos; Kevin Murphy; Alan L. Yuille; |
2016 | 6 | YOLO9000: Better, Faster, Stronger IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce YOLO9000, a state-of-the-art, real-time object detection system that can detect over 9000 object categories. |
Joseph Redmon; Ali Farhadi; |
2016 | 7 | Inception-v4, Inception-ResNet And The Impact Of Residual Connections On Learning IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: With an ensemble of three residual and one Inception-v4, we achieve 3.08 percent top-5 error on the test set of the ImageNet classification (CLS) challenge |
Christian Szegedy; Sergey Ioffe; Vincent Vanhoucke; Alex Alemi; |
2016 | 8 | Xception: Deep Learning With Depthwise Separable Convolutions IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present an interpretation of Inception modules in convolutional neural networks as being an intermediate step in-between regular convolution and the depthwise separable convolution operation (a depthwise convolution followed by a pointwise convolution). |
François Chollet; |
2016 | 9 | PointNet: Deep Learning On Point Sets For 3D Classification And Segmentation IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we design a novel type of neural network that directly consumes point clouds and well respects the permutation invariance of points in the input. |
Charles R. Qi; Hao Su; Kaichun Mo; Leonidas J. Guibas; |
2016 | 10 | Pyramid Scene Parsing Network IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we exploit the capability of global context information by different-region-based context aggregation through our pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet). |
Hengshuang Zhao; Jianping Shi; Xiaojuan Qi; Xiaogang Wang; Jiaya Jia; |
2016 | 11 | The Cityscapes Dataset For Semantic Urban Scene Understanding IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address this, we introduce Cityscapes, a benchmark suite and large-scale dataset to train and test approaches for pixel-level and instance-level semantic labeling. |
MARIUS CORDTS et. al. |
2016 | 12 | Photo-Realistic Single Image Super-Resolution Using A Generative Adversarial Network IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present SRGAN, a generative adversarial network (GAN) for image super-resolution (SR). |
CHRISTIAN LEDIG et. al. |
2016 | 13 | Perceptual Losses For Real-Time Style Transfer And Super-Resolution IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We combine the benefits of both approaches, and propose the use of perceptual loss functions for training feed-forward networks for image transformation tasks. |
Justin Johnson; Alexandre Alahi; Li Fei-Fei; |
2016 | 14 | Identity Mappings In Deep Residual Networks IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we analyze the propagation formulations behind the residual building blocks, which suggest that the forward and backward signals can be directly propagated from one block to any other block, when using identity mappings as the skip connections and after-addition activation. |
Kaiming He; Xiangyu Zhang; Shaoqing Ren; Jian Sun; |
2016 | 15 | Aggregated Residual Transformations For Deep Neural Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a simple, highly modularized network architecture for image classification. |
Saining Xie; Ross Girshick; Piotr Dollár; Zhuowen Tu; Kaiming He; |
2016 | 16 | V-Net: Fully Convolutional Neural Networks For Volumetric Medical Image Segmentation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work we propose an approach to 3D image segmentation based on a volumetric, fully convolutional, neural network. |
Fausto Milletari; Nassir Navab; Seyed-Ahmad Ahmadi; |
2016 | 17 | Wide Residual Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To tackle these problems, in this paper we conduct a detailed experimental study on the architecture of ResNet blocks, based on which we propose a novel architecture where we decrease depth and increase width of residual networks. |
Sergey Zagoruyko; Nikos Komodakis; |
2016 | 18 | Beyond A Gaussian Denoiser: Residual Learning Of Deep CNN For Image Denoising IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we take one step forward by investigating the construction of feed-forward denoising convolutional neural networks (DnCNNs) to embrace the progress in very deep architecture, learning algorithm, and regularization method into image denoising. |
Kai Zhang; Wangmeng Zuo; Yunjin Chen; Deyu Meng; Lei Zhang; |
2016 | 19 | Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present an approach to efficiently detect the 2D pose of multiple people in an image. |
Zhe Cao; Tomas Simon; Shih-En Wei; Yaser Sheikh; |
2016 | 20 | 3D U-Net: Learning Dense Volumetric Segmentation From Sparse Annotation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces a network for volumetric segmentation that learns from sparsely annotated volumetric images. |
Özgün Çiçek; Ahmed Abdulkadir; Soeren S. Lienkamp; Thomas Brox; Olaf Ronneberger; |
2016 | 21 | R-FCN: Object Detection Via Region-based Fully Convolutional Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present region-based, fully convolutional networks for accurate and efficient object detection. |
Jifeng Dai; Yi Li; Kaiming He; Jian Sun; |
2016 | 22 | Adversarial Examples In The Physical World IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Up to now, all previous work have assumed a threat model in which the adversary can feed data directly into the machine learning classifier. |
Alexey Kurakin; Ian Goodfellow; Samy Bengio; |
2016 | 23 | Context Encoders: Feature Learning By Inpainting IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present an unsupervised visual feature learning algorithm driven by context-based pixel prediction. |
Deepak Pathak; Philipp Krahenbuhl; Jeff Donahue; Trevor Darrell; Alexei A. Efros; |
2016 | 24 | Stacked Hourglass Networks For Human Pose Estimation IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work introduces a novel convolutional network architecture for the task of human pose estimation. |
Alejandro Newell; Kaiyu Yang; Jia Deng; |
2016 | 25 | Real-Time Single Image And Video Super-Resolution Using An Efficient Sub-Pixel Convolutional Neural Network IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present the first convolutional neural network (CNN) capable of real-time SR of 1080p videos on a single K2 GPU. |
WENZHE SHI et. al. |
2016 | 26 | Deep Convolutional Neural Networks For Computer-Aided Detection: CNN Architectures, Dataset Characteristics And Transfer Learning IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we exploit three important, but previously understudied factors of employing deep convolutional neural networks to computer-aided detection problems. |
HOO-CHANG SHIN et. al. |
2016 | 27 | Least Squares Generative Adversarial Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To overcome such a problem, we propose in this paper the Least Squares Generative Adversarial Networks (LSGANs) which adopt the least squares loss function for the discriminator. |
XUDONG MAO et. al. |
2016 | 28 | XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose two efficient approximations to standard convolutional neural networks: Binary-Weight-Networks and XNOR-Networks. |
Mohammad Rastegari; Vicente Ordonez; Joseph Redmon; Ali Farhadi; |
2016 | 29 | End To End Learning For Self-Driving Cars IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We trained a convolutional neural network (CNN) to map raw pixels from a single front-facing camera directly to steering commands. |
MARIUSZ BOJARSKI et. al. |
2015 | 1 | Deep Residual Learning For Image Recognition IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. |
Kaiming He; Xiangyu Zhang; Shaoqing Ren; Jian Sun; |
2015 | 2 | U-Net: Convolutional Networks For Biomedical Image Segmentation IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. |
Olaf Ronneberger; Philipp Fischer; Thomas Brox; |
2015 | 3 | Faster R-CNN: Towards Real-Time Object Detection With Region Proposal Networks IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. |
Shaoqing Ren; Kaiming He; Ross Girshick; Jian Sun; |
2015 | 4 | You Only Look Once: Unified, Real-Time Object Detection IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present YOLO, a new approach to object detection. |
Joseph Redmon; Santosh Divvala; Ross Girshick; Ali Farhadi; |
2015 | 5 | SSD: Single Shot MultiBox Detector IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a method for detecting objects in images using a single deep neural network. |
WEI LIU et. al. |
2015 | 6 | Rethinking The Inception Architecture For Computer Vision IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here we explore ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization. |
Christian Szegedy; Vincent Vanhoucke; Sergey Ioffe; Jonathon Shlens; Zbigniew Wojna; |
2015 | 7 | Fast R-CNN IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a Fast Region-based Convolutional Network method (Fast R-CNN) for object detection. |
Ross Girshick; |
2015 | 8 | Delving Deep Into Rectifiers: Surpassing Human-Level Performance On ImageNet Classification IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we study rectifier neural networks for image classification from two aspects. |
Kaiming He; Xiangyu Zhang; Shaoqing Ren; Jian Sun; |
2015 | 9 | SegNet: A Deep Convolutional Encoder-Decoder Architecture For Image Segmentation IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a novel and practical deep fully convolutional neural network architecture for semantic pixel-wise segmentation termed SegNet. |
Vijay Badrinarayanan; Alex Kendall; Roberto Cipolla; |
2015 | 10 | FaceNet: A Unified Embedding For Face Recognition And Clustering IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we present a system, called FaceNet, that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure of face similarity. |
Florian Schroff; Dmitry Kalenichenko; James Philbin; |
2015 | 11 | Learning Deep Features For Discriminative Localization IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we revisit the global average pooling layer proposed in [13], and shed light on how it explicitly enables the convolutional neural network to have remarkable localization ability despite being trained on image-level labels. |
Bolei Zhou; Aditya Khosla; Agata Lapedriza; Aude Oliva; Antonio Torralba; |
2015 | 12 | Multi-Scale Context Aggregation By Dilated Convolutions IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we develop a new convolutional network module that is specifically designed for dense prediction. |
Fisher Yu; Vladlen Koltun; |
2015 | 13 | Convolutional LSTM Network: A Machine Learning Approach For Precipitation Nowcasting IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we formulate precipitation nowcasting as a spatiotemporal sequence forecasting problem in which both the input and the prediction target are spatiotemporal sequences. |
XINGJIAN SHI et. al. |
2015 | 14 | Spatial Transformer Networks IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work we introduce a new learnable module, the Spatial Transformer, which explicitly allows the spatial manipulation of data within the network. |
Max Jaderberg; Karen Simonyan; Andrew Zisserman; Koray Kavukcuoglu; |
2015 | 15 | Accurate Image Super-Resolution Using Very Deep Convolutional Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a highly accurate single-image super-resolution (SR) method. |
Jiwon Kim; Jung Kwon Lee; Kyoung Mu Lee; |
2015 | 16 | Recent Advances In Convolutional Neural Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we provide a broad survey of the recent advances in convolutional neural networks. |
JIUXIANG GU et. al. |
2015 | 17 | Learning Deconvolution Network For Semantic Segmentation IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel semantic segmentation algorithm by learning a deconvolution network. |
Hyeonwoo Noh; Seunghoon Hong; Bohyung Han; |
2015 | 18 | FlowNet: Learning Optical Flow With Convolutional Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we construct appropriate CNNs which are capable of solving the optical flow estimation problem as a supervised learning task. |
PHILIPP FISCHER et. al. |
2015 | 19 | Holistically-Nested Edge Detection IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We develop a new edge detection algorithm that tackles two important issues in this long-standing vision problem: (1) holistic image training and prediction; and (2) multi-scale and multi-level feature learning. |
Saining Xie; Zhuowen Tu; |
2015 | 20 | Multi-view Convolutional Neural Networks For 3D Shape Recognition IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We address this question in the context of learning to recognize 3D shapes from a collection of their rendered views on 2D images. |
Hang Su; Subhransu Maji; Evangelos Kalogerakis; Erik Learned-Miller; |
2015 | 21 | A Neural Algorithm Of Artistic Style IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here we introduce an artificial system based on a Deep Neural Network that creates artistic images of high perceptual quality. |
Leon A. Gatys; Alexander S. Ecker; Matthias Bethge; |
2015 | 22 | Unsupervised Visual Representation Learning By Context Prediction IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work explores the use of spatial context as a source of free and plentiful supervisory signal for training a rich visual representation. |
Carl Doersch; Abhinav Gupta; Alexei A. Efros; |
2015 | 23 | Brain Tumor Segmentation With Deep Neural Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a fully automatic brain tumor segmentation method based on Deep Neural Networks (DNNs). |
MOHAMMAD HAVAEI et. al. |
2015 | 24 | Conditional Random Fields As Recurrent Neural Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To solve this problem, we introduce a new form of convolutional neural network that combines the strengths of Convolutional Neural Networks (CNNs) and Conditional Random Fields (CRFs)-based probabilistic graphical modelling. |
SHUAI ZHENG et. al. |
2015 | 25 | A Large Dataset To Train Convolutional Networks For Disparity, Optical Flow, And Scene Flow Estimation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose three synthetic stereo video datasets with sufficient realism, variation, and size to successfully train large networks. |
NIKOLAUS MAYER et. al. |
2015 | 26 | Deeply-Recursive Convolutional Network For Image Super-Resolution IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose an image super-resolution method (SR) using a deeply-recursive convolutional network (DRCN). |
Jiwon Kim; Jung Kwon Lee; Kyoung Mu Lee; |
2015 | 27 | NetVLAD: CNN Architecture For Weakly Supervised Place Recognition IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We tackle the problem of large scale visual place recognition, where the task is to quickly and accurately recognize the location of a given query photograph. |
Relja Arandjelović; Petr Gronat; Akihiko Torii; Tomas Pajdla; Josef Sivic; |
2015 | 28 | Learning Multi-Domain Convolutional Neural Networks For Visual Tracking IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel visual tracking algorithm based on the representations from a discriminatively trained Convolutional Neural Network (CNN). |
Hyeonseob Nam; Bohyung Han; |
2015 | 29 | Aligning Books And Movies: Towards Story-like Visual Explanations By Watching Movies And Reading Books IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a context-aware CNN to combine information from multiple sources. |
YUKUN ZHU et. al. |
2015 | 30 | Cyclical Learning Rates For Training Neural Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper describes a new method for setting the learning rate, named cyclical learning rates, which practically eliminates the need to experimentally find the best values and schedule for the global learning rates. |
Leslie N. Smith; |
2014 | 1 | Very Deep Convolutional Networks For Large-Scale Image Recognition IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. |
Karen Simonyan; Andrew Zisserman; |
2014 | 2 | Going Deeper With Convolutions IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a deep convolutional neural network architecture codenamed Inception, which was responsible for setting the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC 2014). |
CHRISTIAN SZEGEDY et. al. |
2014 | 3 | Microsoft COCO: Common Objects In Context IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding. |
TSUNG-YI LIN et. al. |
2014 | 4 | ImageNet Large Scale Visual Recognition Challenge IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We conclude with lessons learned in the five years of the challenge, and propose future directions and improvements. |
OLGA RUSSAKOVSKY et. al. |
2014 | 5 | Fully Convolutional Networks For Semantic Segmentation IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our key insight is to build fully convolutional networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. |
Jonathan Long; Evan Shelhamer; Trevor Darrell; |
2014 | 6 | Caffe: Convolutional Architecture For Fast Feature Embedding IF:10 Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models. … |
YANGQING JIA et. al. |
2014 | 7 | Spatial Pyramid Pooling In Deep Convolutional Networks For Visual Recognition IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we equip the networks with another pooling strategy, spatial pyramid pooling, to eliminate the above requirement. |
Kaiming He; Xiangyu Zhang; Shaoqing Ren; Jian Sun; |
2014 | 8 | Learning Spatiotemporal Features With 3D Convolutional Networks IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a simple, yet effective approach for spatiotemporal feature learning using deep 3-dimensional convolutional networks (3D ConvNets) trained on a large scale supervised video dataset. |
Du Tran; Lubomir Bourdev; Rob Fergus; Lorenzo Torresani; Manohar Paluri; |
2014 | 9 | Deep Learning Face Attributes In The Wild IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel deep learning framework for attribute prediction in the wild. |
Ziwei Liu; Ping Luo; Xiaogang Wang; Xiaoou Tang; |
2014 | 10 | Image Super-Resolution Using Deep Convolutional Networks IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a deep learning method for single image super-resolution (SR). |
Chao Dong; Chen Change Loy; Kaiming He; Xiaoou Tang; |
2014 | 11 | Two-Stream Convolutional Networks For Action Recognition In Videos IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We investigate architectures of discriminatively trained deep Convolutional Networks (ConvNets) for action recognition in video. |
Karen Simonyan; Andrew Zisserman; |
2014 | 12 | Long-term Recurrent Convolutional Networks For Visual Recognition And Description IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We develop a novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and demonstrate the value of these models on benchmark video recognition tasks, image description and retrieval problems, and video narration challenges. |
JEFF DONAHUE et. al. |
2014 | 13 | Show And Tell: A Neural Image Caption Generator IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image. |
Oriol Vinyals; Alexander Toshev; Samy Bengio; Dumitru Erhan; |
2014 | 14 | Deep Visual-Semantic Alignments For Generating Image Descriptions IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a model that generates natural language descriptions of images and their regions. |
Andrej Karpathy; Li Fei-Fei; |
2014 | 15 | 3D ShapeNets: A Deep Representation For Volumetric Shapes IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose to represent a geometric 3D shape as a probability distribution of binary variables on a 3D voxel grid, using a Convolutional Deep Belief Network. |
ZHIRONG WU et. al. |
2014 | 16 | High-Speed Tracking With Kernelized Correlation Filters IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Based on this simple observation, we propose an analytic model for datasets of thousands of translated patches. |
João F. Henriques; Rui Caseiro; Pedro Martins; Jorge Batista; |
2014 | 17 | CNN Features Off-the-shelf: An Astounding Baseline For Recognition IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We use features extracted from the \overfeat network as a generic image representation to tackle the diverse range of recognition tasks of object image classification, scene recognition, fine grained recognition, attribute detection and image retrieval applied to a diverse set of datasets. |
Ali Sharif Razavian; Hossein Azizpour; Josephine Sullivan; Stefan Carlsson; |
2014 | 18 | Semantic Image Segmentation With Deep Convolutional Nets And Fully Connected CRFs IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We overcome this poor localization property of deep networks by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF). |
Liang-Chieh Chen; George Papandreou; Iasonas Kokkinos; Kevin Murphy; Alan L. Yuille; |
2014 | 19 | CIDEr: Consensus-based Image Description Evaluation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel paradigm for evaluating image descriptions that uses human consensus. |
Ramakrishna Vedantam; C. Lawrence Zitnick; Devi Parikh; |
2014 | 20 | Depth Map Prediction From A Single Image Using A Multi-Scale Deep Network IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a new method that addresses this task by employing two deep network stacks: one that makes a coarse global prediction based on the entire image, and another that refines this prediction locally. |
David Eigen; Christian Puhrsch; Rob Fergus; |
2014 | 21 | Return Of The Devil In The Details: Delving Deep Into Convolutional Nets IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Source code and models to reproduce the experiments in the paper is made publicly available. |
Ken Chatfield; Karen Simonyan; Andrea Vedaldi; Andrew Zisserman; |
2014 | 22 | Deep Neural Networks Are Easily Fooled: High Confidence Predictions For Unrecognizable Images IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: A recent study revealed that changing an image (e.g. of a lion) in a way imperceptible to humans can cause a DNN to label the image as something else entirely (e.g. mislabeling a lion a library). |
Anh Nguyen; Jason Yosinski; Jeff Clune; |
2014 | 23 | Image Processing IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we regard GMPs as smooth surfaces. |
Franco Rino; |
2014 | 24 | Predicting Depth, Surface Normals And Semantic Labels With A Common Multi-Scale Convolutional Architecture IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we address three different computer vision tasks using a single basic architecture: depth prediction, surface normal estimation, and semantic labeling. |
David Eigen; Rob Fergus; |
2014 | 25 | Deep Domain Confusion: Maximizing For Domain Invariance IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a new CNN architecture which introduces an adaptation layer and an additional domain confusion loss, to learn a representation that is both semantically meaningful and domain invariant. |
Eric Tzeng; Judy Hoffman; Ning Zhang; Kate Saenko; Trevor Darrell; |
2014 | 26 | Deep Learning Face Representation By Joint Identification-Verification IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show that it can be well solved with deep learning and using both face identification and verification signals as supervision. |
Yi Sun; Xiaogang Wang; Xiaoou Tang; |
2014 | 27 | Person Re-identification By Local Maximal Occurrence Representation And Metric Learning IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an effective feature representation called Local Maximal Occurrence (LOMO), and a subspace and metric learning method called Cross-view Quadratic Discriminant Analysis (XQDA). |
Shengcai Liao; Yang Hu; Xiangyu Zhu; Stan Z. Li; |
2014 | 28 | Learning Face Representation From Scratch IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To solve this problem, this paper proposes a semi-automatical way to collect face images from Internet and builds a large scale dataset containing about 10,000 subjects and 500,000 images, called CASIAWebFace. |
Dong Yi; Zhen Lei; Shengcai Liao; Stan Z. Li; |
2014 | 29 | Understanding Deep Image Representations By Inverting Them IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we conduct a direct analysis of the visual information contained in representations by asking the following question: given an encoding of an image, to which extent is it possible to reconstruct the image itself? |
Aravindh Mahendran; Andrea Vedaldi; |
2014 | 30 | Exploiting Linear Structure Within Convolutional Networks For Efficient Evaluation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present techniques for speeding up the test-time evaluation of large convolutional networks, designed for object recognition tasks. |
Emily Denton; Wojciech Zaremba; Joan Bruna; Yann LeCun; Rob Fergus; |
2013 | 1 | Rich Feature Hierarchies For Accurate Object Detection And Semantic Segmentation IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012—achieving a mAP of 53.3%. |
Ross Girshick; Jeff Donahue; Trevor Darrell; Jitendra Malik; |
2013 | 2 | Visualizing And Understanding Convolutional Networks IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a novel visualization technique that gives insight into the function of intermediate feature layers and the operation of the classifier. |
Matthew D Zeiler; Rob Fergus; |
2013 | 3 | Intriguing Properties Of Neural Networks IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we report two such properties. |
CHRISTIAN SZEGEDY et. al. |
2013 | 4 | Deep Inside Convolutional Networks: Visualising Image Classification Models And Saliency Maps IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We consider two visualisation techniques, based on computing the gradient of the class score with respect to the input image. |
Karen Simonyan; Andrea Vedaldi; Andrew Zisserman; |
2013 | 5 | OverFeat: Integrated Recognition, Localization And Detection Using Convolutional Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present an integrated framework for using Convolutional Networks for classification, localization and detection. |
PIERRE SERMANET et. al. |
2013 | 6 | DeCAF: A Deep Convolutional Activation Feature For Generic Visual Recognition IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We evaluate whether features extracted from the activation of a deep convolutional network trained in a fully supervised fashion on a large, fixed set of object recognition tasks can be re-purposed to novel generic tasks. |
JEFF DONAHUE et. al. |
2013 | 7 | DeepPose: Human Pose Estimation Via Deep Neural Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a method for human pose estimation based on Deep Neural Networks (DNNs). |
Alexander Toshev; Christian Szegedy; |
2013 | 8 | Describing Textures In The Wild IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Aiming at supporting this analytical dimension in image understanding, we address the challenging problem of describing textures with semantic attributes. |
Mircea Cimpoi; Subhransu Maji; Iasonas Kokkinos; Sammy Mohamed; Andrea Vedaldi; |
2013 | 9 | Fine-Grained Visual Classification Of Aircraft IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces FGVC-Aircraft, a new dataset containing 10,000 images of aircraft spanning 100 aircraft models, organised in a three-level hierarchy. |
Subhransu Maji; Esa Rahtu; Juho Kannala; Matthew Blaschko; Andrea Vedaldi; |
2013 | 10 | Zero-Shot Learning Through Cross-Modal Transfer IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work introduces a model that can recognize objects in images even if no training data is available for the objects. |
RICHARD SOCHER et. al. |
2013 | 11 | Gradient Magnitude Similarity Deviation: A Highly Efficient Perceptual Image Quality Index IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a new effective and efficient IQA model, called gradient magnitude similarity deviation (GMSD). |
Wufeng Xue; Lei Zhang; Xuanqin Mou; Alan C. Bovik; |
2013 | 12 | Scalable Object Detection Using Deep Neural Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a saliency-inspired neural network model for detection, which predicts a set of class-agnostic bounding boxes along with a single score for each box, corresponding to its likelihood of containing any object of interest. |
Dumitru Erhan; Christian Szegedy; Alexander Toshev; Dragomir Anguelov; |
2013 | 13 | Image Segmentation In Video Sequences: A Probabilistic Approach IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The basic idea of this paper is that we can classify each pixel using a model of how that pixel looks when it is part of different classes. |
Nir Friedman; Stuart Russell; |
2013 | 14 | Medical Image Fusion: A Survey Of The State Of The Art IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This review article provides a factual listing of methods and summarizes the broad scientific challenges faced in the field of medical image fusion. |
A. P. James; B. V. Dasarathy; |
2013 | 15 | A Survey Of Appearance Models In Visual Object Tracking IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this survey, we first decompose the problem of appearance modeling into two different processing stages: visual representation and statistical modeling. |
XI LI et. al. |
2013 | 16 | Multi-digit Number Recognition From Street View Imagery Using Deep Convolutional Neural Networks IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we address an equally hard sub-problem in this domain viz. recognizing arbitrary multi-digit numbers from Street View imagery. |
Ian J. Goodfellow; Yaroslav Bulatov; Julian Ibarz; Sacha Arnoud; Vinay Shet; |
2013 | 17 | SEEDS: Superpixels Extracted Via Energy-Driven Sampling IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a new approach based on a simple hill-climbing optimization. |
Michael Van den Bergh; Xavier Boix; Gemma Roig; Luc Van Gool; |
2013 | 18 | Advances In Hyperspectral Image Classification: Earth Monitoring With Statistical Learning Methods IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: New methods have been presented to account for the spatial homogeneity of images, to include user’s interaction via active learning, to take advantage of the manifold structure with semisupervised learning, to extract and encode invariances, or to adapt classifiers and image representations to unseen yet similar scenes. |
Gustavo Camps-Valls; Devis Tuia; Lorenzo Bruzzone; Jón Atli Benediktsson; |
2013 | 19 | Fast Training Of Convolutional Networks Through FFTs IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a simple algorithm which accelerates training and inference by a significant factor, and can yield improvements of over an order of magnitude compared to existing state-of-the-art implementations. |
Michael Mathieu; Mikael Henaff; Yann LeCun; |
2013 | 20 | Rotational Projection Statistics For 3D Local Surface Description And Object Recognition IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a novel method named Rotational Projection Statistics (RoPS). |
Yulan Guo; Ferdous Sohel; Mohammed Bennamoun; Min Lu; Jianwei Wan; |
2013 | 21 | Dropout Improves Recurrent Neural Networks For Handwriting Recognition IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that their performance can be greatly improved using dropout – a recently proposed regularization method for deep architectures. |
Vu Pham; Théodore Bluche; Christopher Kermorvant; Jérôme Louradour; |
2013 | 22 | PANDA: Pose Aligned Networks For Deep Attribute Modeling IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a method for inferring human attributes (such as gender, hair style, clothes style, expression, action) from images of people under large variation of viewpoint, pose, appearance, articulation and occlusion. |
Ning Zhang; Manohar Paluri; Marc’Aurelio Ranzato; Trevor Darrell; Lubomir Bourdev; |
2013 | 23 | Indoor Semantic Segmentation Using Depth Information IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work addresses multi-class segmentation of indoor scenes with RGB-D inputs. |
Camille Couprie; Clément Farabet; Laurent Najman; Yann LeCun; |
2013 | 24 | Recognizing Image Style IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We describe an approach to predicting style of images, and perform a thorough evaluation of different image features for these tasks. |
SERGEY KARAYEV et. al. |
2013 | 25 | Deep Convolutional Ranking For Multilabel Image Annotation IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to leverage the advantage of such features and analyze key components that lead to better performances. |
Yunchao Gong; Yangqing Jia; Thomas Leung; Alexander Toshev; Sergey Ioffe; |
2013 | 26 | Some Improvements On Deep Convolutional Neural Network Based Image Classification IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We investigate multiple techniques to improve upon the current state of the art deep convolutional neural network based image classification pipeline. |
Andrew G. Howard; |
2013 | 27 | Coded Aperture Compressive Temporal Imaging IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present experimental results for reconstruction at 148 frames per coded snapshot. |
PATRICK LLULL et. al. |
2013 | 28 | Fast Image Scanning With Deep Max-Pooling Convolutional Neural Networks IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show how dynamic programming can speedup the process by orders of magnitude, even when max-pooling layers are present. |
Alessandro Giusti; Dan C. Cireşan; Jonathan Masci; Luca M. Gambardella; Jürgen Schmidhuber; |
2013 | 29 | Shadow Detection: A Survey And Comparative Evaluation Of Recent Methods IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The survey covers methods published during the last decade, and places them in a feature-based taxonomy comprised of four categories: chromacity, physical, geometry and textures. |
Andres Sanin; Conrad Sanderson; Brian C. Lovell; |
2013 | 30 | Patch-based Probabilistic Image Quality Assessment For Face Selection And Improved Video-based Face Recognition IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an efficient patch-based face image quality assessment algorithm which quantifies the similarity of a face image to a probabilistic face model, representing an ideal face. |
Yongkang Wong; Shaokang Chen; Sandra Mau; Conrad Sanderson; Brian C. Lovell; |
2012 | 1 | UCF101: A Dataset Of 101 Human Actions Classes From Videos In The Wild IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce UCF101 which is currently the largest dataset of human actions. |
Khurram Soomro; Amir Roshan Zamir; Mubarak Shah; |
2012 | 2 | Multi-column Deep Neural Networks For Image Classification IF:9 Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: Traditional methods of computer vision and machine learning cannot match human performance on tasks such as the recognition of handwritten digits or traffic signs. Our … |
Dan Cireşan; Ueli Meier; Juergen Schmidhuber; |
2012 | 3 | Efficient Inference In Fully Connected CRFs With Gaussian Edge Potentials IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we consider fully connected CRF models defined on the complete set of pixels in an image. |
Philipp Krähenbühl; Vladlen Koltun; |
2012 | 4 | Sparse Subspace Clustering: Algorithm, Theory, And Applications IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose and study an algorithm, called Sparse Subspace Clustering (SSC), to cluster data points that lie in a union of low-dimensional subspaces. |
Ehsan Elhamifar; Rene Vidal; |
2012 | 5 | Invariant Scattering Convolution Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The mathematical analysis of wavelet scattering networks explains important properties of deep convolution networks for classification. |
Joan Bruna; Stéphane Mallat; |
2012 | 6 | Generalized Principal Component Analysis (GPCA) IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents an algebro-geometric solution to the problem of segmenting an unknown number of subspaces of unknown and varying dimensions from sample data points. |
Rene Vidal; Yi Ma; Shankar Sastry; |
2012 | 7 | Pedestrian Detection With Unsupervised Multi-Stage Feature Learning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Adding to the list of successful applications of deep learning methods to vision, we report state-of-the-art and competitive results on all major pedestrian datasets with a convolutional network model. |
Pierre Sermanet; Koray Kavukcuoglu; Soumith Chintala; Yann LeCun; |
2012 | 8 | An Evaluation Of Popular Copy-Move Forgery Detection Approaches IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to answer which copy-move forgery detection algorithms and processing steps (e.g., matching, filtering, outlier detection, affine transformation estimation) perform best in various postprocessing scenarios. |
Vincent Christlein; Christian Riess; Johannes Jordan; Corinna Riess; Elli Angelopoulou; |
2012 | 9 | Unsupervised Discovery Of Mid-Level Discriminative Patches IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The goal of this paper is to discover a set of discriminative patches which can serve as a fully unsupervised mid-level visual representation. |
Saurabh Singh; Abhinav Gupta; Alexei A. Efros; |
2012 | 10 | A Multi-View Embedding Space For Modeling Internet Images, Tags, And Their Semantics IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present two ways to train the three-view embedding: supervised, with the third view coming from ground-truth labels or search keywords; and unsupervised, with semantic themes automatically obtained by clustering the tags. |
Yunchao Gong; Qifa Ke; Michael Isard; Svetlana Lazebnik; |
2012 | 11 | Convolutional Neural Networks Applied To House Numbers Digit Classification IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We classify digits of real-world house numbers using convolutional neural networks (ConvNets). |
Pierre Sermanet; Soumith Chintala; Yann LeCun; |
2012 | 12 | Face Expression Recognition And Analysis: The State Of The Art IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The paper presents a time-line view of the advances made in this field, the applications of automatic face expression recognizers, the characteristics of an ideal system, the databases that have been used and the advances made in terms of their standardization and a detailed summary of the state of the art. |
Vinay Bettadapura; |
2012 | 13 | Poisson Noise Reduction With Non-local PCA IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a novel denoising algorithm for photon-limited images which combines elements of dictionary learning and sparse patch-based representations of images. |
Joseph Salmon; Zachary Harmany; Charles-Alban Deledalle; Rebecca Willett; |
2012 | 14 | A New Local Adaptive Thresholding Technique In Binarization IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes a locally adaptive thresholding technique that removes background by using local mean and mean deviation. |
T. Romen Singh; Sudipta Roy; O. Imocha Singh; Tejmani Sinam; Kh. Manglem Singh; |
2012 | 15 | Regularized Robust Coding For Face Recognition IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new face coding model, namely regularized robust coding (RRC), which could robustly regress a given signal with regularized regression coefficients. |
Meng Yang; Lei Zhang; Jian Yang; David Zhang; |
2012 | 16 | Stable Image Reconstruction Using Total Variation Minimization IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This article presents near-optimal guarantees for accurate and robust image recovery from under-sampled noisy measurements using total variation minimization. |
Deanna Needell; Rachel Ward; |
2012 | 17 | Constructing The L2-Graph For Robust Subspace Learning And Subspace Clustering IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel method to eliminate the effects of the errors from the projection space (representation) rather than from the input space. |
Xi Peng; Zhiding Yu; Huajin Tang; Zhang Yi; |
2012 | 18 | Collaborative Representation Based Classification For Face Recognition IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we discuss how SRC works, and show that the collaborative representation mechanism used in SRC is much more crucial to its success of face classification. |
Lei Zhang; Meng Yang; Xiangchu Feng; Yi Ma; David Zhang; |
2012 | 19 | Scene Parsing With Multiscale Feature Learning, Purity Trees, And Optimal Covers IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The scene parsing method proposed here starts by computing a tree of segments from a graph of pixel dissimilarities. |
Clément Farabet; Camille Couprie; Laurent Najman; Yann LeCun; |
2012 | 20 | Mahotas: Open Source Software For Scriptable Computer Vision IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The interface is in Python, a dynamic programming language, which is very appropriate for fast development, but the algorithms are implemented in C++ and are tuned for speed. |
Luis Pedro Coelho; |
2012 | 21 | SVD Based Image Processing Applications: State Of The Art, Contributions And Research Challenges IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The aim of this paper is to provide a better understanding of the SVD in image processing and identify important various applications and open research directions in this increasingly important area; SVD based image processing in the future research. |
Rowayda A. Sadek; |
2012 | 22 | Multimodal Similarity-preserving Hashing IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce an efficient computational framework for hashing data belonging to multiple modalities into a single representation space where they become mutually comparable. |
Jonathan Masci; Michael M. Bronstein; Alexander A. Bronstein; Jürgen Schmidhuber; |
2012 | 23 | Real-time Image-based 6-DOF Localization In Large-Scale Environments IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a real-time approach for image-based localization within large scenes that have been reconstructed offline using structure from motion (Sfm). |
Hyon Lim; Sudipta Sinha; Michael Cohen; Matt Uyttendaele; |
2012 | 24 | Stable And Robust Sampling Strategies For Compressive Imaging IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we turn to a more refined notion of coherence — the so-called local coherence — measuring for each sensing vector separately how correlated it is to the sparsity basis. |
Felix Krahmer; Rachel Ward; |
2012 | 25 | Image Labeling On A Network: Using Social-Network Metadata For Image Classification IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Since these types of data are inherently relational, we propose a model that explicitly accounts for the interdependencies between images sharing common properties. |
Julian McAuley; Jure Leskovec; |
2012 | 26 | Image Processing Using Smooth Ordering Of Its Patches IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an image processing scheme based on reordering of its patches. |
Idan Ram; Michael Elad; Israel Cohen; |
2012 | 27 | Graph Degree Linkage: Agglomerative Clustering On A Directed Graph IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a simple but effective graph-based agglomerative algorithm, for clustering high-dimensional data. |
Wei Zhang; Xiaogang Wang; Deli Zhao; Xiaoou Tang; |
2012 | 28 | Difference Of Normals As A Multi-Scale Operator In Unorganized Point Clouds IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The Difference of Normals (DoN) provides a computationally efficient, multi-scale approach to processing large unorganized 3D point clouds. |
Yani Ioannou; Babak Taati; Robin Harrap; Michael Greenspan; |
2012 | 29 | Kernel Principal Component Analysis And Its Applications In Face Recognition And Active Shape Models IF:4 Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: Principal component analysis (PCA) is a popular tool for linear dimensionality reduction and feature extraction. Kernel PCA is the nonlinear form of PCA, which better exploits the … |
Quan Wang; |
2012 | 30 | Intra-Retinal Layer Segmentation Of 3D Optical Coherence Tomography Using Coarse Grained Diffusion Map IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a fast segmentation method based on a new variant of spectral graph theory named diffusion maps. |
Raheleh Kafieh; Hossein Rabbani; Michael D. Abramoff; Milan Sonka; |
2011 | 1 | Moving Object Detection By Detecting Contiguous Outliers In The Low-Rank Representation IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show that above challenges can be addressed in a unified framework named DEtecting Contiguous Outliers in the LOw-rank Representation (DECOLOR). |
Xiaowei Zhou; Can Yang; Weichuan Yu; |
2011 | 2 | 3D Terrestrial Lidar Data Classification Of Complex Natural Scenes Using A Multi-scale Dimensionality Criterion: Applications In Geomorphology IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present the technique and illustrate its efficiency in separating riparian vegetation from ground and classifying a mountain stream as vegetation, rock, gravel or water surface. |
Nicolas Brodu; Dimitri Lague; |
2011 | 3 | Local Naive Bayes Nearest Neighbor For Image Classification IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Local Naive Bayes Nearest Neighbor, an improvement to the NBNN image classification algorithm that increases classification accuracy and improves its ability to scale to large numbers of object classes. |
Sancho McCann; David G. Lowe; |
2011 | 4 | Compressive Imaging Using Approximate Message Passing And A Markov-Tree Prior IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel algorithm for compressive imaging that exploits both the sparsity and persistence across scales found in the 2D wavelet transform coefficients of natural images. |
Subhojit Som; Philip Schniter; |
2011 | 5 | Introduction To The Bag Of Features Paradigm For Image Classification And Retrieval IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents an introduction to BoF image representations, describes critical design choices, and surveys the BoF literature. |
Stephen O’Hara; Bruce A. Draper; |
2011 | 6 | SHREC 2011: Robust Feature Detection And Description Benchmark IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The present paper is a report of the SHREC’11 robust feature detection and description benchmark results. |
E. BOYER et. al. |
2011 | 7 | Minutiae Extraction From Fingerprint Images – A Review IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a review of a large number of techniques present in the literature for extracting fingerprint minutiae. |
Roli Bansal; Priti Sehgal; Punam Bedi; |
2011 | 8 | Continuous Multiclass Labeling Approaches And Algorithms IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The generic framework ensures existence of minimizers and covers a wide range of relaxations of the originally combinatorial problem. |
Jan Lellmann; Christoph Schnörr; |
2011 | 9 | A Supervised Clustering Approach For FMRI-based Inference Of Brain States IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a method that combines signals from many brain regions observed in functional Magnetic Resonance Imaging (fMRI) to predict the subject’s behavior during a scanning session. |
VINCENT MICHEL et. al. |
2011 | 10 | A Panorama On Multiscale Geometric Representations, Intertwining Spatial, Directional And Frequency Selectivity IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a panorama of the aforementioned literature on decompositions in multiscale, multi-orientation bases or dictionaries. |
Laurent Jacques; Laurent Duval; Caroline Chaux; Gabriel Peyré; |
2011 | 11 | Prostate Biopsy Tracking With Deformation Estimation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a volume-swept 3D US based tracking system for fast and accurate estimation of prostate tissue motion is proposed. |
Michael Baumann; Pierre Mozer; Vincent Daanen; Jocelyne Troccaz; |
2011 | 12 | An Axis-Based Representation For Recognition IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a new axis-based shape representation scheme along with a matching framework to address the problem of generic shape recognition. |
Cagri Aslan; Sibel Tari; |
2011 | 13 | The IHS Transformations Based Image Fusion IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Therefore, the main purpose of this work is to explore different IHS transformation techniques and experiment it as IHS based image fusion. |
Firouz Abdullah Al-Wassai; N. V. Kalyankar; Ali A. Al-Zuky; |
2011 | 14 | A Review Of Research On Devnagari Character Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This article is intended to serve as a guide and update for the readers, working in the Devnagari Optical Character Recognition (DOCR) area. |
V J Dongre; V H Mankar; |
2011 | 15 | Real Time Face Recognition Using Adaboost Improved Fast PCA Algorithm IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents an automated system for human face recognition in a real time background world for a large homemade dataset of persons face. |
K. Susheel Kumar; Vijay Bhaskar Semwal; R C Tripathi; |
2011 | 16 | On The Cohomology Of 3D Digital Images IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a method for computing the cohomology ring of three–dimensional (3D) digital binary-valued pictures. |
Rocio Gonzalez-Diaz; Pedro Real; |
2011 | 17 | Positive Semidefinite Metric Learning Using Boosting-like Algorithms IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a boosting-based technique, termed BoostMetric, for learning a quadratic Mahalanobis distance metric. |
Chunhua Shen; Junae Kim; Lei Wang; Anton van den Hengel; |
2011 | 18 | Disconnected Skeleton: Shape At Its Absolute Scale IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a new skeletal representation along with a matching framework to address the deformable shape recognition problem. |
C. Aslan; A. Erdem; E. Erdem; S. Tari; |
2011 | 19 | Statistical Compressed Sensing Of Gaussian Mixture Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A novel framework of compressed sensing, namely statistical compressed sensing (SCS), that aims at efficiently sampling a collection of signals that follow a statistical distribution, and achieving accurate reconstruction on average, is introduced. |
Guoshen Yu; Guillermo Sapiro; |
2011 | 20 | Design Of An Optical Character Recognition System For Camera-based Handheld Devices IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a complete Optical Character Recognition (OCR) system for camera captured image/graphics embedded textual documents for handheld devices. |
Ayatullah Faruk Mollah; Nabamita Majumder; Subhadip Basu; Mita Nasipuri; |
2011 | 21 | A Multiple Component Matching Framework For Person Re-Identification IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Building on these similarities, we propose a Multiple Component Matching (MCM) framework for the person re-identification problem, which is inspired by Multiple Component Learning, a framework recently proposed for object detection. |
Riccardo Satta; Giorgio Fumera; Fabio Roli; Marco Cristani; Vittorio Murino; |
2011 | 22 | Salient Local 3D Features For 3D Shape Retrieval IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we describe a new formulation for the 3D salient local features based on the voxel grid inspired by the Scale Invariant Feature Transform (SIFT). |
Afzal Godil; Asim Imdad Wagan; |
2011 | 23 | Fingerprint Recognition Using Standardized Fingerprint Model IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper discusses on the standardized fingerprint model which is used to synthesize the template of fingerprints. |
Le Hoang Thai; Ha Nhat Tam; |
2011 | 24 | A Linear Framework For Region-based Image Segmentation And Inpainting Involving Curvature Penalization IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present the first method to handle curvature regularity in region-based image segmentation and inpainting that is independent of initialization. |
Thomas Schoenemann; Fredrik Kahl; Simon Masnou; Daniel Cremers; |
2011 | 25 | Convex Approaches To Model Wavelet Sparsity Patterns IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose new modeling approaches based on group-sparsity penalties that leads to convex optimizations that can be solved exactly and efficiently. |
Nikhil S Rao; Robert D. Nowak; Stephen J. Wright; Nick G. Kingsbury; |
2011 | 26 | Anti-sparse Coding For Approximate Nearest Neighbor Search IF:4 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper proposes a binarization scheme for vectors of high dimension based on the recent concept of anti-sparse coding, and shows its excellent performance for approximate … |
Hervé Jégou; Teddy Furon; Jean-Jacques Fuchs; |
2011 | 27 | Leveraging Billions Of Faces To Overcome Performance Barriers In Unconstrained Face Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We employ the face recognition technology developed in house at face.com to a well accepted benchmark and show that without any tuning we are able to considerably surpass state of the art results. |
Yaniv Taigman; Lior Wolf; |
2011 | 28 | A Comparative Experiment Of Several Shape Methods In Recognizing Plants IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this research, a comparative experiment of 4 methods to identify plants using shape features was accomplished. |
A. Kadir; L. E. Nugroho; A. Susanto; P. I. Santosa; |
2011 | 29 | Steps Towards A Theory Of Visual Information: Active Perception, Signal-to-Symbol Conversion And The Interplay Between Sensing And Control IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This manuscript describes the elements of a theory of information tailored to control and decision tasks and specifically to visual data. |
Stefano Soatto; |
2011 | 30 | Estimating 3D Human Shapes From Measurements IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a technique that extrapolates the statistically inferred shape to fit the measurement data using nonlinear optimization. |
Stefanie Wuhrer; Chang Shu; |
2010 | 1 | Image Deblurring And Super-resolution By Adaptive Sparse Domain Selection And Adaptive Regularization IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Considering that the contents can vary significantly across different images or different patches in a single image, we propose to learn various sets of bases from a pre-collected dataset of example image patches, and then for a given patch to be processed, one set of bases are adaptively selected to characterize the local sparse domain. |
Weisheng Dong; Lei Zhang; Guangming Shi; Xiaolin Wu; |
2010 | 2 | Solving Inverse Problems With Piecewise Linear Estimators: From Gaussian Mixture Models To Structured Sparsity IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A general framework for solving image inverse problems is introduced in this paper. |
Guoshen Yu; Guillermo Sapiro; Stéphane Mallat; |
2010 | 3 | Survey Of Nearest Neighbor Techniques IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present the survey of such techniques. |
Nitin Bhatia; |
2010 | 4 | Lesion Border Detection In Dermoscopy Images IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Methods: In this article, we present a systematic overview of the recent border detection methods in the literature paying particular attention to computational issues and evaluation aspects. |
M. Emre Celebi; Hitoshi Iyatomi; Gerald Schaefer; William V. Stoecker; |
2010 | 5 | A Comprehensive Review Of Image Enhancement Techniques IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The paper focuses on spatial domain techniques for image enhancement, with particular reference to point processing methods and histogram processing. |
Raman Maini; Himanshu Aggarwal; |
2010 | 6 | TILT: Transform Invariant Low-rank Textures IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show how to efficiently and effectively extract a class of low-rank textures in a 3D scene from 2D images despite significant corruptions and warping. |
Zhengdong Zhang; Arvind Ganesh; Xiao Liang; Yi Ma; |
2010 | 7 | Image Segmentation By Using Threshold Techniques IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper attempts to undertake the study of segmentation image techniques by using five threshold methods as Mean method, P-tile method, Histogram Dependent Technique (HDT), Edge Maximization Technique (EMT) and visual Technique and they are compared with one another so as to choose the best technique for threshold segmentation techniques image. |
Salem Saleh Al-amri; N. V. Kalyankar; Khamitkar S. D.; |
2010 | 8 | Fast L1-Minimization Algorithms For Robust Face Recognition IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, our study addresses the speed and scalability of its algorithms. |
Allen Y. Yang; Zihan Zhou; Arvind Ganesh; S. Shankar Sastry; Yi Ma; |
2010 | 9 | Fast Inference In Sparse Coding Algorithms With Applications To Object Recognition IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we propose a simple and efficient algorithm to learn basis functions. |
Koray Kavukcuoglu; Marc’Aurelio Ranzato; Yann LeCun; |
2010 | 10 | Hybrid Linear Modeling Via Local Best-fit Flats IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a simple and fast geometric method for modeling data by a union of affine subspaces. |
Teng Zhang; Arthur Szlam; Yi Wang; Gilad Lerman; |
2010 | 11 | Automatic Image Segmentation By Dynamic Region Merging IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper addresses the automatic image segmentation problem in a region merging style. |
Bo Peng; Lei Zhang; David Zhang; |
2010 | 12 | Feature Level Fusion Of Face And Fingerprint Biometrics IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The aim of this paper is to study the fusion at feature extraction level for face and fingerprint biometrics. |
Ajita Rattani; Dakshina Ranjan Kisku; Manuele Bicego; Massimo Tistarelli; |
2010 | 13 | Automatic Detection Of Blue-White Veil And Related Structures In Dermoscopy Images IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this article, we present a machine learning approach to the detection of blue-white veil and related structures in dermoscopy images. |
M. EMRE CELEBI et. al. |
2010 | 14 | Segmentation Of Natural Images By Texture And Boundary Compression IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel algorithm for segmentation of natural images that harnesses the principle of minimum description length (MDL). |
Hossein Mobahi; Shankar R. Rao; Allen Y. Yang; Shankar S. Sastry; Yi Ma; |
2010 | 15 | Classification With Scattering Operators IF:4 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: A scattering vector is a local descriptor including multiscale and multi-direction co-occurrence information. It is computed with a cascade of wavelet decompositions and complex … |
Joan Bruna; Stéphane Mallat; |
2010 | 16 | Combining Multiple Feature Extraction Techniques For Handwritten Devnagari Character Recognition IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we present an OCR for Handwritten Devnagari Characters. |
Sandhya Arora; Debotosh Bhattacharjee; Mita Nasipuri; Dipak Kumar Basu; Mahantapas Kundu; |
2010 | 17 | Performance Comparison Of SVM And ANN For Handwritten Devnagari Character Recognition IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we discuss the characteristics of the some classification methods that have been successfully applied to handwritten Devnagari character recognition and results of SVM and ANNs classification method, applied on Handwritten Devnagari characters. |
SANDHYA ARORA et. al. |
2010 | 18 | A Comparative Study Of Removal Noise From Remote Sensing Image IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper attempts to undertake the study of three types of noise such as Salt and Pepper (SPN), Random variation Impulse Noise (RVIN), Speckle (SPKN). |
Salem Saleh Al-amri; N. V. Kalyankar; S. D. Khamitkar; |
2010 | 19 | The Projected GSURE For Automatic Parameter Tuning In Iterative Shrinkage Methods IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we focus on optimally selecting such parameters in iterative shrinkage methods for image deblurring and image zooming. |
Raja Giryes; Michael Elad; Yonina C Eldar; |
2010 | 20 | Face Identification By SIFT-based Complete Graph Topology IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a new face identification system based on Graph Matching Technique on SIFT features extracted from face images. |
Dakshina Ranjan Kisku; Ajita Rattani; Enrico Grosso; Massimo Tistarelli; |
2010 | 21 | Nonlinear Vector Filtering For Impulsive Noise Removal From Color Images IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a comprehensive survey of 48 filters for impulsive noise removal from color images is presented. |
M. Emre Celebi; Hassan A. Kingravi; Y. Alp Aslandogan; |
2010 | 22 | Hybrid Medical Image Classification Using Association Rule Mining With Decision Tree Algorithm IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The two image mining approaches with a hybrid manner have been proposed in this paper. |
P. Rajendran; M. Madheswaran; |
2010 | 23 | Handwritten Bangla Basic And Compound Character Recognition Using MLP And SVM Classifier IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A novel approach for recognition of handwritten compound Bangla characters, along with the Basic characters of Bangla alphabet, is presented here. |
NIBARAN DAS et. al. |
2010 | 24 | Generalized Tree-Based Wavelet Transform IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we propose a new wavelet transform applicable to functions defined on graphs, high dimensional data and networks. |
Idan Ram; Michael Elad; Israel Cohen; |
2010 | 25 | An Explicit Nonlinear Mapping For Manifold Learning IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, an explicit nonlinear mapping is proposed for manifold learning, based on the assumption that there exists a polynomial mapping between the high-dimensional data samples and their low-dimensional representations. |
Hong Qiao; Peng Zhang; Di Wang; Bo Zhang; |
2010 | 26 | Real-Time Implementation Of Order-Statistics Based Directional Filters IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce two methods to speed up these filters. |
M. Emre Celebi; |
2010 | 27 | A Family Of Statistical Symmetric Divergences Based On Jensen’s Inequality IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel parametric family of symmetric information-theoretic distances based on Jensen’s inequality for a convex functional generator. |
Frank Nielsen; |
2010 | 28 | Active Testing For Face Detection And Localization IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We provide a novel search technique, which uses a hierarchical model and a mutual information gain heuristic to efficiently prune the search space when localizing faces in images. |
Raphael Sznitman; Bruno Jedynak; |
2010 | 29 | Real-time Robust Principal Components’ Pursuit IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the recent work of Candes et al, the problem of recovering low rank matrix corrupted by i.i.d. sparse outliers is studied and a very elegant solution, principal component pursuit, is proposed. |
Chenlu Qiu; Namrata Vaswani; |
2010 | 30 | Scalable Large-Margin Mahalanobis Distance Metric Learning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a fast and scalable algorithm to learn a Mahalanobis distance metric. |
Chunhua Shen; Junae Kim; Lei Wang; |