Journals starting with iccv

* 3D Hand Pose Reconstruction Using Specialized Mappings
* 3D LAMP: A New Layered Panoramic Representation
* 3D Object Recognition Using Shape Similarity-Based Aspect Graph
* 3D Object Tracking Using Shape-Encoded Particle Propagation
* 3D-Mode: A 3D Modelling and Measurement System Using a Few Photos
* 4-Sensor Camera Calibration for Image Representation Invariant to Shading, Shadows, Lighting, and Specularities
* Accurate Catadioptric Calibration for Real-time Pose Estimation of Room-size Environments
* Accurate Optical Flow in Noisy Image Sequences
* Affine 3-D Reconstruction from Two Projective Images of Independently Translating Planes
* Affine Calibration from Moving Objects
* Affine Invariant Erosion of 3D Shapes
* Alignment of Non-Overlapping Sequences
* Ambiguous Configurations for the 1D Structure and Motion Problem
* AMILab, a 2D/3D Image Processing Software
* Articulated Soft Objects for Video-based Body Modeling
* Automated 3D PDM Construction Using Deformable Models
* Automatic Registration of 2-D with 3-D Imagery in Urban Environments
* Automatic Segmentation and Indexing in a Database of Bird Images
* Background Model Initialization Algorithm for Video Surveillance, A
* Beyond Lambert: Reconstructing Surfaces with Arbitrary BRDFs
* Biologically Motivated, Precise and Simple Calibration and Reconstruction Using a Stereo Light Microscope
* Blind Removal of Image Non-Linearities
* BraMBLe: A Bayesian Multiple-Blob Tracker
* BRDF/BTF Measurement Device
* Calibration with Robust Use of Cheirality by Quasi-Affine Reconstruction of the Set of Camera Projection Centres
* Camera Calibration and 3D Reconstruction from Single Images Using Parallelepipeds
* Capturing Natural Hand Articulation
* Car Detection in Low Resolution Aerial Images
* Caustics of Catadioptric Cameras
* Characterization of Inherent Stereo Ambiguities, A
* Cheirality in Epipolar Geometry
* Classifying and Solving Minimal Structure and Motion Problems with Missing Data
* Cloning Your Own Face with a Desktop Camera
* Co-inference Approach to Robust Visual Tracking, A
* Color Constancy Using KL-Divergence
* Color Eigenflows: Statistical Modeling of Joint Color Changes
* Colour Photometric Stereo: Simultaneous Reconstruction of Local Gradient and Colour of Rough Textured Surfaces
* Combining Single View Recognition and Multiple View Stereo for Architectural Scenes
* Computationally Efficient Face Detection
* Computing Visual Correspondence with Occlusions via Graph Cuts
* Concentric Mosaic(s), Planar Motion and 1D Cameras
* Confidence and Curvature Estimation of Curvilinear Structures in 3-D
* Constrained Active Appearance Models
* Continuous Global Evidence-Based Bayesian Modality Fusion for Simultaneous Tracking of Multiple Objects
* Control of Home Appliances Using Face and Hand Sign Recognition
* Curve Evolution Approach for Image Segmentation Using Adaptive Flows, A
* Data-Driven Model for Monocular Face Tracking, A
* Database of Human Segmented Natural Images and its Application to Evaluating Segmentation Algorithms and Measuring Ecological Statistics, A
* Demonstration of Handprinted Symbol Recognition, A
* Demonstration of Segmentation with Interactive Graph Cuts
* Depth from Defocus in Presence of Partial Self Occlusion
* Deriving Intrinsic Images from Image Sequences
* Determining Reflectance Parameters and Illumination Distribution from a Sparse Set of Images for View-dependent Image Synthesis
* Dimensional Analysis of Image Motion
* Do Ambiguous Reconstructions Always Give Ambiguous Images?
* Do We Really Have to Consider Covariance Matrices for Image Features?
* Document Restoration Using 3D Shape: A General Deskewing Algorithm for Arbitrarily Warped Documents
* Dynamic Textures
* Earth Mover's Distance is the Mallows Distance: Some Insights from Statistics, The
* Efficient Dense Depth Estimation from Dense Multiperspective Panoramas
* Efficient Sequential Karhunen-Loeve Basis Extraction
* EM Algorithm for Video Summarization, Generative Model Approach, An
* Empirical Filter Estimation for Subpixel Interpolation and Matching
* Error Analysis of Pure Rotation-Based Self-Calibration
* Estimation and Interpretation of Discontinuities in Optical Flow Fields
* Euclidean Reconstruction and Auto-Calibration from Continuous Motion
* Example-Based Facial Sketch Generation with Non-parametric Sampling
* Face Recognition with Support Vector Machines: Global versus Component-based Approach
* FaceTracker: A Human Face Tracking and Facial Organ Localizing System
* Fast Algorithm for Nearest Neighbor Search Based on a Lower Bound Tree
* Feature Based Object Recognition using Statistical Occlusion Models with One-to-one Correspondence
* Feature Selection from Huge Feature Sets
* Finding Anomalies in an Arbitrary Image
* Flux Maximizing Geometric Flows
* Folds and Cuts: How Shading Flows Into Edges
* Framework for Segmentation of Talk and Game Shows, A
* Gabor Feature Classifier for Face Recognition, A
* General Imaging Model and a Method for Finding its Parameters, A
* Generalized Mosaicing
* Geometrical Fundamentals of Polycentric Panoramas
* Global Matching Framework for Stereo Computation, A
* Gradient Vector Flow Fast Geodesic Active Contours
* Guiding Random Particles by Deterministic Search
* Harmonics Extraction Based on Higher Order Statistics Spectrum Decomposition for A Unified Texture Model
* Hierarchical Pre-Segmentation without Prior Knowledge
* High Dynamic Range Panoramic Imaging
* Human Tracking in Multiple Cameras
* Human Tracking with Mixtures of Trees
* Illumination Insensitive Eigenspaces
* Image Detection Under Varying Illumination and Pose
* Image Segmentation by Data-Driven Markov Chain Monte Carlo
* Image Segmentation with Minimum Mean Cut
* Impact of Viewing Geometry on Vision Through the Atmosphere, The
* Improving AR using Shadows Arising from Natural Illumination Distribution in Video Sequences
* Incorporating Differential Constraints in a 3D Reconstruction Process. Application to Stereo
* Incorporating Process Knowledge into Object Recognition for Assemblies
* Indexing Based on Scale Invariant Interest Points
* Interactive Graph Cuts for Optimal Boundary and Region Segmentation of Objects in N-D Images
* Invariant Mixture Recognition in Hyperspectral Images
* JetStream: Probabilistic Contour Extraction with Particles
* Joint Feature Distributions for Image Correspondence
* Kernel Machine Based Learning for Multi-View Face Detection and Pose Estimation
* KGBR Viewpoint-Lighting Ambiguity and its Resolution by Generic Constraints, The
* Lambertian Reflectance and Linear Subspaces
* Learning Image Statistics for Bayesian Tracking
* Learning Inhomogeneous Gibbs Model of Faces by Minimax Entropy
* Learning Local Evidence for Shading and Reflectance
* Learning Low Dimensional Invariant Signature of 3-D Object under Varying View and Illumination from 2-D Appearances
* Learning Spectral Calibration Parameters for Color Inspection
* Learning the Semantics of Words and Pictures
* Linear Dual-Space Approach to 3D Surface Reconstruction from Occluding Contours using Algebraic Surfaces, A
* Linear Multi View Reconstruction and Camera Recovery
* Markov Face Models
* Matching Shapes
* Maximum Likelihood Framework for Iterative Eigendecomposition, A
* Model-Based Bundle Adjustment with Application to Face Modeling
* Model-Based Initialisation of Vehicle Tracking: Dependency on Illumination
* Modelling Faces Dynamically across Views and Over Time
* Motion Estimation from Disparity Images
* Motion Segmentation by Subspace Separation and Model Selection
* Multi-Agent Event Recognition
* Multi-Frame Infinitesimal Motion Model for the Reconstruction of (Dynamic) Scenes with Multiple Linearly Moving Objects
* Multi-View Scene Capture by Surfel Sampling: From Video Streams to Non-Rigid 3D Motion, Shape and Reflectance
* Multiple Motion Scene Reconstruction from Uncalibrated Views
* Multiple View Geometry of Non-planar Algebraic Curves
* New Imaging Model, A
* New Perspectives on Geometric Reflection Theory From Rough Surfaces
* Noise in Bilinear Problems
* Novel Modeling Algorithm for Shape Recovery of Unknown Topology, A
* Occlusion Robust Adaptive Template Tracking
* Omni-Rig: Linear Self-Recalibration of a Rig with Varying Internal and External Parameters
* On Cosine-Fourth and Vignetting Effects in Real Lenses
* On Projection Matrices P^k, -> P^2, k=,3..., 6, and their Applications in Computer Vision
* On the Complexity of Probabilistic Image Retrieval
* Optimal Method for the Affine F-Matrix and Its Uncertainty Estimation in the Sense of both Noise and Outliers
* Optimal Motion Estimation from Multiview Normalized Epipolar Constraint
* Pairwise Face Recognition
* Parallel-Perspective Stereo Mosaics
* People Tracking Using Hybrid Monte Carlo Filtering
* Performance Evaluation of Stereo for Tele-presence
* Photometric Image-Based Rendering for Image Generation in Arbitrary Illumination
* Physics-based Model Acquisition and Identification in Airborne Spectral Images
* Plan-View Trajectory Estimation with Dense Stereo Background Models
* Plane-based Projective Reconstruction
* Precise Sub-Pixel Estimation on Area-Based Matching
* Probabilistic Framework for Segmenting People Under Occlusion
* Probabilistic Framework for Space Carving, A
* Probabilistic Learning and Modelling of Object Dynamics for Tracking
* Probabilistic Tracking in a Metric Space
* Projective Structure and Motion from Two Views of a Piecewise Planar Scene
* Propagation of Innovative Information in Non-Linear Least-Squares Structure from Motion
* Pseudo-Distance Map for the Segmentation-Free Skeletonization of Gray-Scale Images, A
* Real-time Automated Concurrent Visual Tracking of Many Animals and Subsequent Behavioral Compilation
* Real-Time Feature Tracking and Outlier Rejection with Changes in Illumination
* Real-Time Tracking of Highly Articulated Structures in the Presence of Noisy Measurements
* Real-Time Video Phase-Locked Loops
* Real-time Virtual Object Insertion
* Recognition of Shapes by Editing Shock Graphs
* Recognizing Large 3-D Objects through Next View Planning using an Uncalibrated Camera
* Reconstructing Surfaces Using Anisotropic Basis Functions
* Reducing Drift in Parametric Motion Tracking
* Region Extraction from Multiple Images
* Region Segmentation via Deformable Model-Guided Split and Merge
* Robust Histogram Construction from Color Invariants
* Robust Interest Points Matching Algorithm, A
* Robust Principal Component Analysis for Computer vision
* Robust Real-Time Face Detection
* Segmentation and range sensing using a moving-aperture lens
* Segmentation of the Left Ventricle in Cardiac MR Images
* Segmentation with Pairwise Attraction and Repulsion
* Selecting Objects With Freehand Sketches
* Self-Calibrating Camera Projector Systems for Interactive Displays and Presentations
* Self-Calibration of a Stereo Rig Using Monocular Epipolar Geometry
* Self-Supervised Learning for Object Recognition based on Kernel Discriminant-EM Algorithm
* Separating Appearance from Deformation
* Separation of Multiple Objects in Motion Images by Clustering
* Sequential Monte Carlo Fusion of Sound and Vision for Speaker Tracking
* Shadow Carving
* Shape Deformation: SVM Regression and Application to Medical Image Segmentation
* Shape from Texture and Integrability
* Simple and Efficient Template Matching Algorithm, A
* Simultaneous Estimation of Super-Resolved Intensity and Depth Maps from Low Resolution Defocused Observations of a Scene
* Smarter Presentations: Exploiting Homography in Camera-Projector Systems
* Space of All Stereo Images, The
* Sparse PCA: Extracting Multi-scale Structure from Data
* Split Aperture Imaging for High Dynamic Range
* Statistical Approach to Background Subtraction for Surveillance Systems, A
* Statistical Calibration of the CCD Imaging Process
* Statistical Context Priming for Object Detection
* Stereo Matching by Compact Windows via Minimum Ratio Cycle
* Stereoscopic Segmentation
* Stochastic Processes in Vision: From Langevin to Beltrami
* Stochastic Rigidity: Image Registration for Nowhere-Static Scenes
* Stochastic Road Shape Estimation
* Stripe Boundary Codes for Real-Time Structured-Light Range Scanning of Moving Objects
* Structure and Motion from Silhouettes
* Structure from Motion Using Sequential Monte Carlo Methods
* Surface Matching by 3D Point's Fingerprint
* Tele-Graffiti: A Pen and Paper-Based Remote Sketching System
* Template Matching Approach to Content Based Image Indexing by Low Dimensional Euclidean Embedding
* Topology Free Hidden Markov Models: Application to Background Modeling
* Towards Real-Time Multi-Modality 3-D Medical Image Registration
* True Single View Point Cone Mirror Omni-Directional Catadioptric System
* Trust-Region Methods for Real-Time Tracking
* Unambigous Determination of Shape from Photometric Stereo with Unknown Light Sources
* Uncalibrated Motion Capture Exploiting Articulated Structure Constraints
* Using Scene Constraints during the Calibration Procedure
* Variable Bandwidth Mean Shift and Data-Driven Scale Selection, The
* Variational Model for Filling-in Gray Level and Color Images, A
* Versatile Method for Trifocal Tensor Estimation, A
* Very High Accuracy Velocity Estimation using Orientation Tensors, Parametric Motion, and Simultaneous Segmentation of the Motion Field
* Video Georegistration: Algorithm and Quantitative Evaluation
* Video Objects Segmentation Using Eulerian Region-Based Active Contours
* Video Phase-Locked Loops in Gait Recognition
* View-Based Clustering of Object Appearances Based on Independent Subspace Analysis
* Viewpoint Invariant Texture Matching and Wide Baseline Stereo
* Visual Learning by Integrating Descriptive and Generative Methods
* Visual Servoing Invariant to Changes in Camera Intrinsic Parameters
* What Value Covariance Information in Estimating Vision Parameters?
220 for ICCV01

* 3D Tracking = Classification + Interpolation
* Active concept learning for image retrieval in dynamic databases
* Adaptive Dynamic Range Imaging: Optical Control of Pixel Exposures over Space and Time
* affine invariant deformable shape representation for general curves, An
* Affine-invariant local descriptors and neighborhood statistics for texture recognition
* Appearance sampling for obtaining a set of basis images for variable illumination
* Applying the information bottleneck principle to unsupervised clustering of discrete and continuous image representations
* Assessing accuracy factors in deformable 2d/3d medical image registration using a statistical pelvis model
* Autocalibration of a projector-screen-camera system: theory and algorithm for screen-to-camera homography estimation
* automatic drowning detection surveillance system for challenging outdoor pool environments, An
* Automatic video summarization by graph modeling
* Automatically labeling video data using multi-class active learning
* background layer model for object tracking through occlusion, A
* Background modeling and subtraction of dynamic scenes
* bayesian approach to unsupervised one-shot learning of object categories, A
* Bayesian clustering of optical flow fields
* bayesian network framework for relational shape matching, A
* Beltrami flow over implicit manifolds, The
* Binocular Helmholtz Stereopsis
* Boosting chain learning for object detection
* Calibrating pan-tilt cameras in wide-area surveillance networks
* Calibration of a hybrid camera network
* Camera calibration using spheres: a semi-definite programming approach
* Camera Calibration with Known Rotation
* Capturing subtle facial motions in 3d face tracking
* Caratheodory-Fejer Approach to Robust Multiframe Tracking, A
* Catadioptric Camera Calibration Using Geometric Invariants
* catchment feature model for multimodal language analysis, The
* Circular motion geometry by minimal 2 points in 4 images
* class of photometric invariants: separating material from shape and illumination, A
* Color edge detection by photometric quasi-invariants
* Combinatorial constraints on multiple projections of a set of points
* Combining gradient and albedo data for rotation invariant classification of 3d surface texture
* Comparison of Graph Cuts with Belief Propagation for Stereo, Using Identical MRF Parameters
* Computing geodesics and minimal surfaces via graph cuts
* Computing MAP Trajectories by Representing, Propagating and Combining PDFs Over Groups
* Conditional feature sensitivity: a unifying view on active recognition and feature selection
* Constraining human body tracking
* Context-Based Vision System for Place and Object Recognition
* Controlling model complexity in flow estimation
* Counting people in crowds with a real-time network of simple image sensors
* Cumulative residual entropy, a new measure of information and its application to image alignment
* cylindrical surface model to rectify the bound document image, A
* Dealing with textureless regions and specular highlights: A progressive space carving scheme using a novel photo-consistency measure
* Dense matching of multiple wide-baseline views
* Dense shape reconstruction of a moving object under arbitrary, unknown lighting
* Detecting Pedestrians Using Patterns of Motion and Appearance
* Determining reflectance and light position from a single image without distant illumination assumption
* Discriminative random fields: a discriminative framework for contextual interaction in classification
* Dominant sets and hierarchical clustering
* Dynamic stroke information analysis for video-based handwritten chinese character recognition
* Dynamic texture segmentation
* efficient image similarity measure based on approximations of KL-divergence between two gaussian mixtures, An
* Efficient, robust and accurate fitting of a 3d morphable model
* Entropy-of-likelihood feature selection for image correspondence
* Epitomic analysis of appearance and shape
* Eye design in the plenoptic space of light rays
* Eye gaze estimation from a single image of one eye
* Face sketch synthesis and recognition
* Facial expression decomposition
* Facial expression understanding in image sequences using dynamic and active visual information fusion
* Fast intensity-based 2d-3d image registration of clinical data using light fields
* Fast Pose Estimation with Parameter-Sensitive Hashing
* Fast stereo matching using reliability-based dynamic programming and consistency constraints
* Fast vehicle detection with probabilistic feature grouping and its application to vehicle tracking
* Feature selection for unsupervised and supervised inference: The emergence of sparsity in a weighted-based approach
* Filtering using a tree-based estimator
* Fragmentation in the vision of scenes
* Fusion of Static and Dynamic Body Biometrics for Gait Recognition
* Gamut Constrained Illuminant Estimation
* Gaze manipulation for one-to-one teleconferencing
* Geometric segmentation of perspective images based on symmetry groups
* Globally convergent autocalibration
* Good continuations in digital image level lines
* Graph partition by Swendsen-Wang cuts
* High resolution terrain mapping using low altitude aerial stereo imagery
* Highlight removal by illumination-constrained inpainting
* How to deal with point correspondences and tangential velocities in the level set framework
* Image Parsing: Unifying Segmentation, Detection, and Recognition
* Image registration with global and local luminance alignment
* Image spaces and video trajectories: Using isomap to explore video sequences
* Image statistics and anisotropic diffusion
* Image-Based Rendering Using Image-Based Priors
* Images as bags of pixels
* Improved fast gauss transform and efficient kernel density estimation
* Incorporating the Torrance and Sparrow model of reflectance in uncalibrated photometric stereo
* Inferring 3D Structure with a Statistical Image-Based Shape Model
* Information theoretic focal length selection for real-time active 3-d object tracking
* Integrated Edge and Junction Detection with the Boundary Tensor
* Joint region tracking with switching hypothesized measurements
* Landmark-based shape deformation with topology-preserving constraints
* Large-scale event detection using semi-hidden markov models
* Learning a classification model for segmentation
* Learning a locality preserving subspace for visual recognition
* Learning and Inferring Image Segmentations Using the GBP Typical Cut Algorithm
* Learning how to inpaint from global image statistics
* Learning pedestrian models for silhouette refinement
* Linear multi-view reconstruction of points, lines, planes and cameras using a reference plane
* Local Projective Shape of Smooth Surfaces and Their Outlines, The
* Machine learning and multiscale methods in the identification of bivalve larvae
* Maintaining multi-modality through mixture tracking
* Markov-based failure prediction for human motion analysis
* Mean shift based clustering in high dimensions: A Texture Classification Example
* Meshfree particle method
* Minimally-supervised classification using multiple observation sets
* Minimum risk distance measure for object recognition
* Mirrors in motion: epipolar geometry and motion estimation
* model-based approach for automated feature extraction in fundus images, A
* Model-based multiple view reconstruction of people
* Modeling Textured Motion: Particle, Wave and Sketch
* multi-scale generative model for animate shapes and parts, A
* Multiclass spectral clustering
* Multiple-cue illumination estimation in textured scenes
* Multiple-view structure and motion from line correspondences
* Multiview reconstruction of space curves
* Natural Image Statistics for Natural Image Segmentation
* new paradigm for recognizing 3-d object shapes from range data, A
* new perspective [on] shape-from-shading, A
* Noniterative Greedy Algorithm for Multiframe Point Correspondence, A
* Nonmetric lens distortion calibration: Closed-form solutions, robust estimation and model selection
* novel approach for texture shape recovery, A
* Object recognition with informative features and linear classification
* Obstacle detection using projective invariant and vanishing lines
* On exploiting occlusions in multiple-view geometry
* On the epipolar geometry of the crossed-slits projection
* On the use of marginal statistics of subband images
* Online Selection of Discriminative Tracking Features
* Outlier correction in image sequences for the affine camera
* Paracatadioptric camera calibration using lines
* Perspective shape from shading and viscosity solutions
* Phenomenological eigenfunctions for image irradiance
* Photo-consistent 3d fire by flame-sheet decomposition
* Plane-based calibration algorithm for multi-camera systems via factorization of homography matrices
* Polarization-based inverse rendering from a single view
* Polarization-based transparent surface modeling from two views
* Preemptive RANSAC for live structure and motion estimation
* Probabilistic bilinear models for appearance-based vision
* Ranking prior likelihood distributions for Bayesian shape localization framework
* Real-Time Pattern Matching Using Projection Kernels
* Real-time simultaneous localisation and mapping with a single camera
* Recognising panoramas
* Recognition of group activities using dynamic probabilistic networks
* Recognition with local features: the kernel recipe
* Recognizing action at a distance
* Recognizing human action efforts: An adaptive three-mode PCA framework
* Recovery of epipolar geometry as a manifold fitting problem
* Reflectance-based classification of color edges
* Regression Based Bandwidth Selection for Segmentation Using Parzen Windows
* Reinforcement learning for combining relevance feedback techniques
* Reliable recovery of piled box-like objects via parabolically deformable superquadrics
* Robust regression with projection based m-estimators
* Scene modeling based on constraint system decomposition techniques
* segmentation algorithm for contrast-enhanced images, A
* Segmenting foreground objects from a dynamic textured background via a robust Kalman filter
* Selection of scale-invariant parts for object class recognition
* Separating Reflection Components of Textured Surfaces Using a Single Image
* Shape and motion under varying illumination: unifying structure from motion, photometric stereo, and multi-view stereo
* Shape gradients for histogram segmentation using active contours
* Shape representation via harmonic embedding
* Space-time interest points
* sparse probabilistic learning algorithm for real-time tracking, A
* Spectral partitioning for structure from motion
* Statistical background subtraction for a mobile observer
* Stochastic refinement of the visual hull to satisfy photometric and silhouette consistency constraints
* Surface classification using conformal structures
* Surface reconstruction by integrating 3d and 2d data of multiple views
* Surface reconstruction from feature based stereo
* Surface reflectance modeling of real objects with interreflections
* SVM-based nonparametric discriminant analysis, an application to face detection
* Tales of shape and radiance in multi-view stereo
* Texture segmentation by multiscale aggregation of filter responses and shape elements
* theory of multiplexed illumination, A
* Towards a mathematical theory of primal sketch and sketchability
* Towards direct recovery of shape and motion parameters from image sequences
* Towards Gauge Invariant Bundle Adjustment: A Solution Based on Gauge Dependent Damping
* Tracking across multiple cameras with disjoint views
* Tracking articulated body by dynamic markov network
* Tracking articulated hand motion with eigen dynamics analysis
* Tracking Objects Using Density Matching and Shape Priors
* Two-frame wide baseline matching
* Unified subspace analysis for face recognition
* Unsupervised image translation
* Unsupervised improvement of visual detectors using co-training
* Unsupervised non-parametric region segmentation using level sets
* Using prior shape and intensity profile in medical image segmentation
* Using specularities for recognition
* Using temporal coherence to build models of animals
* Variable bandwidth QMDPE and its application in robust optical flow estimation
* Variational frameworks for DT-MRI estimation, regularization and visualization
* Variational space-time motion segmentation
* Variational stereovision and 3d scene flow estimation with statistical similarity measures
* Video google: A text retrieval approach to object matching in videos
* Video input driven animation (VIDA)
* View-invariant alignment and matching of video sequences
* Visual correspondence using energy minimization and mutual information
* Voxel carving for specular surfaces
* Weighted and robust incremental method for subspace learning
* What does motion reveal about transparency?
199 for ICCV03

* 3D Object Reconstruction from a Single 2D Line Drawing without Hidden Lines
* 3D Shape Recognition and Reconstruction Based on Line Element Geometry
* 8-Point Algorithm Revisited: Factorized 8-Point Algorithm
* Actions as Space-Time Shapes
* Active Search for Real-Time Vision
* Adaptive Enhancement of Cardiac Magnetic Resonance (CMR) Images
* Algebraic Approach to Surface Reconstruction from Gradient Fields, An
* Appearance Modeling Under Geometric Context
* Automatic 3D Face Modeling from Video
* Avoiding the Streetlight Effect: Tracking by Exploring Likelihood Modes
* Axis-Based Representation for Recognition, An
* Background Estimation as a Labeling Problem
* Basic Gray Level Aura Matrices: Theory and its Application to Texture Synthesis
* Bayesian Approach for Shadow Extraction from a Single Image, A
* Bayesian Autocalibration for Surveillance
* Bayesian Body Localization Using Mixture of Nonlinear Shape Models
* Bayesian Structural Content Abstraction for Region-Level Image Authentication
* Behaviour Understanding in Video: A Combined Method
* Beyond Trees: Common-Factor Models for 2D Human Pose Recovery
* Bi-Directional Tracking Using Trajectory Segment Analysis
* Bilinear Illumination Model for Robust Face Recognition, A
* Bottom-up/Top-Down Image Parsing by Attribute Graph Grammar
* BRDF Invariant Stereo Using Light Transport Constancy
* Building a Classification Cascade for Visual Identification from One Example
* Can Two Specular Pixels Calibrate Photometric Stereo?
* Class-Specific Material Categorisation
* Closely Coupled Object Detection and Segmentation
* Combining Generative Models and Fisher Kernels for Object Recognition
* Combining Image Regions and Human Activity for Indirect Object Recognition in Indoor Wide-Angle Views
* Common Pattern Discovery Using Earth Mover's Distance and Local Flow Maximization
* Conditional Random Fields for Contextual Human Motion Recognition
* Conformal Metrics and True Gradient Flows for Curves
* Consistent Segmentation for Optical Flow Estimation
* Consistent Surface Color for Texturing Large Objects in Outdoor Scenes
* Contour-Based Learning for Object Detection
* Convex Grouping Combining Boundary and Region Information
* Coupled Space Learning for Image Style Transformation
* Creating Efficient Codebooks for Visual Recognition
* Deformation Invariant Image Matching
* Degenerate Cases and Closed-form Solutions for Camera Calibration with One-Dimensional Objects
* Designing Spatially Coherent Minimizing Flows for Variational Problems Based on Active Contours
* Detecting Irregularities in Images and in Video
* Detecting Rotational Symmetries
* Detection and Tracking of Moving Objects from a Moving Platform in Presence of Strong Parallax
* Detection of Concentric Circles for Camera Calibration
* Detection of Multiple, Partially Occluded Humans in a Single Image by Bayesian Combination of Edgelet Part Detectors
* Detection, Analysis and Matching of Hair
* Discontinuity Preserving Stereo with Small Baseline Multi-Flash Illumination
* Discovering Objects and their Localization in Images
* Dynamic Measurement Clustering to Aid Real Time Tracking
* Dynamic Refraction Stereo
* Edge-Based Rich Representation for Vehicle Classification
* Efficient Block Noise Removal Based on Nonlinear Manifolds
* Efficient Learning of Relational Object Class Models
* Efficient Model-Based 3D Tracking of Deformable Objects
* Efficient Visual Event Detection Using Volumetric Features
* Efficiently Registering Video into Panoramic Mosaics
* Efficiently Solving Dynamic Markov Random Fields Using Graph Cuts
* Eliminating Structure and Intensity Misalignment in Image Stitching
* Enhanced Correlation-Based Method for Stereo Correspondence with Sub-Pixel Accuracy, An
* Ensemble Prior of Image Structure for Cross-Modal Inference, An
* Ensuring Color Consistency across Multiple Cameras
* Evaluation of Features Detectors and Descriptors Based on 3D Objects
* Expectation Maximization Approach to the Synergy between Image Segmentation and Object Categorization, An
* Exploring the Space of a Human Action
* Face Recognition by Stepwise Nonparametric Margin Maximum Criterion
* Face Recognition in the Presence of Multiple Illumination Sources
* Face Recognition with MRC-Boosting
* Fast Global Kernel Density Mode Seeking with Application to Localisation and Tracking
* Fast Multiple Object Tracking via a Hierarchical Particle Filter
* Fast Recognition of Multi-View Faces with Feature Selection
* Fast Texture-Based Tracking and Delineation Using Texture Entropy
* Feature Hierarchies for Object Classification
* Features for Recognition: Viewpoint Invariance for Non-Planar Scenes
* Finding Tree Structures by Grouping Symmetries
* Fitting Globally Stabilized Algebraic Surfaces to Range Data
* Fixed Point Probability Field for Complex Occlusion Handling
* Fundamental Matrix for Cameras with Radial Distortion
* Fusing Points and Lines for High Performance Tracking
* Fusion of Multi-View Silhouette Cues Using a Space Occupancy Grid
* General Framework for Temporal Video Scene Segmentation, A
* Generative/Discriminative Learning Algorithm for Image Classification, A
* Geometric and Photometric Restoration of Distorted Documents
* Geometric Context from a Single Image
* Geometric Invariants and Applications under Catadioptric Camera Model
* Globally Optimal Estimates for Geometric Reconstruction Problems
* Globally Optimal Solutions for Energy Minimization in Stereo Vision Using Reweighted Belief Propagation
* Good Continuation of General 2D Visual Features: Dual Harmonic Models and Computational Inference
* Graph Cut Algorithm for Generalized Image Deconvolution, A
* Guiding Model Search Using Segmentation
* Hierarchical Field Framework for Unified Context-Based Classification, A
* High Resolution Tracking of Non-Rigid 3D Motion of Densely Sampled Data Using Harmonic Maps
* Hilbert Functions and Applications to the Estimation of Subspace Arrangements
* How Hard is 3-View Triangulation Really?
* Identifying Individuals in Video by Combining Generative and Discriminative Head Models
* Image Based Regression Using Boosting Method
* Image Statistics Based on Diffeomorphic Matching
* Improved Sub-pixel Stereo Correspondences through Symmetric Refinement
* Incorporating Visual Knowledge Representation in Stereo Reconstruction
* Incremental Discovery of Object Parts in Video Sequences
* Inference of Non-Overlapping Camera Network Topology by Measuring Statistical Dependence
* Integrated Framework for Image Segmentation and Perceptual Grouping, An
* Integrated Spatial and Frequency Domain 2D Motion Segmentation and Estimation
* Integrating Representative and Discriminative Models for Object Category Detection
* Integrating the Effects of Motion, Illumination and Structure in Video Sequences
* Integration of Conditionally Dependent Object Features for Robust Figure/Background Segmentation
* Is ICA Significantly Better than PCA for Face Recognition?
* Is Levenberg-Marquardt the Most Efficient Optimization Algorithm for Implementing Bundle Adjustment?
* Iterative Optimization Approach for Unified Image Segmentation and Matting, An
* Joint Haar-like Features for Face Detection
* KALMANSAC: Robust Filtering by Consensus
* Kernel-Based Multifactor Analysis for Image Synthesis and Recognition
* Large Deformation Diffeomorphic Metric Mapping of Fiber Orientations
* Layered Active Appearance Models
* Learning a Sparse, Corner-Based Representation for Time-varying Background Modeling
* Learning and Inference in Parametric Switching Linear Dynamical Systems
* Learning Effective Image Metrics from Few Pairwise Examples
* Learning Hierarchical Models of Scenes, Objects, and Parts
* Learning Layered Motion Segmentation of Video
* Learning Models for Predicting Recognition Performance
* Learning Non-Generative Grammatical Models for Document Analysis
* Learning Non-Negative Sparse Image Codes by Convex Programming
* Learning Object Categories from Google's Image Search
* Learning the Probability of Correspondences without Ground Truth
* Lighting Normalization with Generic Intrinsic Illumination Subspace for Face Recognition
* Linear Approaches to Camera Calibration from Sphere Images or Active Intrinsic Calibration Using Vanishing Points
* Local Features for Object Class Recognition
* Local Gabor Binary Pattern Histogram Sequence (LGBPHS): A Novel Non-Statistical Model for Face Representation and Recognition
* LOCUS: Learning Object Classes with Unsupervised Segmentation
* Manifold Clustering
* Maximum Entropy Framework for Part-Based Texture and Object Recognition, A
* Mesh Optimization Using an Inconsistency Detection Template
* Mixtures of Dynamic Textures
* Model-Based Vehicle Segmentation Method for Tracking, A
* Modeling Scenes with Local Descriptors and Latent Aspects
* Modelling Shapes with Uncertainties: Higher Order Polynomials, Variable Bandwidth Kernels and Non Parametric Density Estimation
* More-Than-Topology-Preserving Flows for Active Contours and Polygons
* Multi-Modal Tensor Face for Simultaneous Super-Resolution and Recognition
* Multi-Scale Gesture Recognition from Time-Varying Contours
* Multi-Scale Hybrid Linear Model for Lossy Image Representation, A
* Multi-View AAM Fitting and Camera Calibration
* Multi-View Geometry of 1D Radial Cameras and its Application to Omnidirectional Camera Calibration
* Multi-View Reconstruction Using Photo-consistency and Exact Silhouette Constraints: A Maximum-Flow Formulation
* Multi-View Surface Reconstruction Using Polarization
* Multilevel Banded Graph Cuts Method for Fast Image Segmentation, A
* Multiperspective Projection and Collineation
* Multiple Light Sources and Reflectance Property Estimation Based on a Mixture of Spherical Distributions
* Multiple View Geometry and the L_inf-norm
* Mutual Information Regularized Bayesian Framework for Multiple Image Restoration
* Mutual Information-Based 3D Surface Matching with Applications to Face Recognition and Brain Mapping
* N-Dimensional Probablility Density Function Transfer and its Application to Colour Transfer
* Neighborhood Preserving Embedding
* New Framework for Approximate Labeling via Graph Cuts, A
* Non-Negative Lighting and Specular Object Recognition
* Non-Orthogonal Binary Subspace and Its Applications in Computer Vision
* Non-Parametric Self-Calibration
* Object Categorization by Learned Universal Visual Dictionary
* Object Detection in Aerial Imagery Based on Enhanced Semi-Supervised Learning
* Object Recognition in High Clutter Images Using Line Features
* Object Tracking across Multiple Independently Moving Aerial Cameras
* Objective Image Fusion Performance Characterisation
* Okapi-Chamfer Matching for Articulated Object Recognition
* On Optimal Light Configurations in Photometric Stereo
* On Optimizing Template Matching via Performance Characterization
* On the Equivalence of Common Approaches to Lighting Insensitive Recognition
* On the Spatial Statistics of Optical Flow
* On-Line Density-Based Appearance Modeling for Object Tracking
* Opaque Document Imaging: Building Images of Inaccessible Texts
* Parameter-Free Radial Distortion Correction with Center of Distortion Estimation
* Passive Photometric Stereo from Motion
* Patch Based Blind Image Super Resolution
* Perceptual Scale-Space and Its Applications
* Periodic Motion Detection and Segmentation via Approximate Sequence Alignment
* Phase Field Models and Higher-Order Active Contours
* Photometric Stereo under Perspective Projection
* Practical Single Image Based Approach for Estimating Illumination Distribution from Shadows, A
* Prior-Based Segmentation by Projective Registration and Level Sets
* Priors for People Tracking from Small Training Sets
* Probabilistic Boosting-Tree: Learning Discriminative Models for Classification, Recognition, and Clustering
* Probabilistic Contour Extraction Using Hierarchical Shape Representation
* Probabilistic Semantic Model for Image Annotation and Multi-Modal Image Retrieval, A
* Progressive Surface Reconstruction from Images Using a Local Prior
* Pyramid Match Kernel: Discriminative Classification with Sets of Image Features, The
* Quasiconvex Optimization for Robust Geometric Reconstruction
* Randomized RANSAC with Sequential Probability Ratio Test
* Real-Time Interactively Distributed Multi-Object Tracking Using a Magnetic-Inertia Potential Model
* Realtime IBR with Omnidirectional Crossed-Slits Projection
* Recognizing Human Actions in Videos Acquired by Uncalibrated Moving Cameras
* Reconstructing the Geometry of Flowing Water
* Recovering Facial Shape and Albedo Using a Statistical Model of Surface Normal Direction
* Recovering Human Body Configurations Using Pairwise Constraints between Parts
* Recovering Photometric Properties of Multiple Strongly-Reflective, Partially-Transparent Surfaces from a Single Image
* Registration of Multimodal Fluorescein Images Sequence of the Retina
* Regular Polygon Detection
* Retrieval with Knowledge-driven Kernel Design: An Approach to Improving SVM-Based CBIR with Relevance Feedback
* Robust Algorithm for Point Set Registration Using Mixture of Gaussians, A
* Robust Path-Based Spectral Clustering with Application to Image Segmentation
* Robust Point Matching for Two-Dimensional Nonrigid Shapes
* Robust Structure from Motion and Identified Dynamics
* Scale-Invariant Contour Completion Using Conditional Random Fields
* Segmentation of Hybrid Motions via Hybrid Quadratic Surface Analysis
* Semi-Supervised Framework for Mapping Data to the Intrinsic Manifold, A
* Separating Reflections in Human Iris Images for Illumination Estimation
* Separating Transparent Layers of Repetitive Dynamic Behaviors
* Shadow Flow: A Recursive Method to Learn Moving Cast Shadows
* Shape and Appearance Repair for Incomplete Point Surfaces
* Shape and Spatially-Varying BRDFs from Photometric Stereo
* Shape Classifer Based on Generalized Probabilistic Descent Method with Hidden Markov Descriptor
* Shape from Symmetry
* Shape Parameter Optimization for AdaBoosted Active Shape Model
* Shape Recovery of 3D Data Obtained from a Moving Range Sensor by Using Image Sequences
* Shape-Based Segmentation Approach: An Improved Technique Using Level Sets, A
* Shapelets Correlated with Surface Normals Produce Surfaces
* Simultaneous Facial Action Tracking and Expression Recognition Using a Particle Filter
* Simultaneous Multiple 3D Motion Estimation via Mode Finding on Lie Groups
* Space-Time Scene Manifolds
* Sparse Image Coding Using a 3D Non-Negative Tensor Factorization
* Spectral Technique for Correspondence Problems Using Pairwise Constraints, A
* Spherical Matching for Temporal Correspondence of Non-Rigid Surfaces
* Squaring the Circles in Panoramas
* Stochastic Filter for Fluid Motion Tracking, A
* Structured Light in Scattering Media
* Supervised Learning Framework for Generic Object Detection in Images, A
* Surface Parameterization Using Riemann Surface Structure
* Symmetric Patch-Based Correspondence Model for Occlusion Handling, A
* TemporalBoost for Event Recognition
* Theoretical Limit on the Number of Effective Pixels that can be Optically Resolved on a Non-Planar Subject, A
* Theory of Inverse Light Transport, A
* Theory of Refractive and Specular 3D Shape by Light-Path Triangulation, A
* Towards Ultimate Motion Estimation: Combining Highest Accuracy with Real-Time Performance
* Uncalibrated Perspective Reconstruction of Deformable Structures
* Unifying Approach to Hard and Probabilistic Clustering, A
* Using Extended Light Sources for Modeling Object Appearance under Varying Illumination
* Using Eye Reflections for Face Recognition Under Varying Illumination
* Using Frontier Points to Recover Shape, Reflectance and Illumunation
* Variational-Based Method to Extract Parametric Shapes from Images
* Vector Boosting for Rotation Invariant Multi-View Face Detection
* Vehicle Identification between Non-Overlapping Cameras without Direct Feature Matching
* Video Behaviour Profiling and Abnormality Detection without Manual Labelling
* Vignette and Exposure Calibation and Compensation
* Visual Learning Given Sparse Data of Unknown Complexity
* Visual Speech Recognition with Loosely Synchronized Feature Streams
* What Metrics Can Be Approximated by Geo-Cuts, Or Global Optimization of Length/Area and Flux
* When Does a Camera See Rain?
245 for ICCV05

* 3-D Metric Reconstruction and Registration of Images of Near-planar Surfaces
* 3D generic object categorization, localization and pose estimation
* 3D Model based Object Class Detection in An Arbitrary View
* 3D object recognition from range images using pyramid matching
* 3D Object Representation Using Transform and Scale Invariant 3D Features
* 3D Teacher for Car Detection in Aerial Images, A
* 3D-3D Registration Problem Revisited, The
* Accurate Non-Iterative O(n) Solution to the PnP Problem
* Action Recognition from Arbitrary Views using 3D Exemplars
* Active Learning with Gaussian Processes for Object Categorization
* Adaptive enhancement and noise reduction in very low light-level video
* Adaptive Vocabulary Forests for Dynamic Indexing and Category Learning
* Applications of parametric maxflow in computer vision
* Articulated Shape Matching by Robust Alignment of Embedded Representations
* Automatic Cardiac View Classification of Echocardiogram
* Best of Both Worlds: Combining 3D Deformable Models with Active Shape Models, The
* Bilayer Stereo Matching
* Biologically Inspired System for Action Recognition, A
* Blurred/Non-Blurred Image Alignment using Sparseness Prior
* Boosting Invariance and Efficiency in Supervised Learning
* Bottom-up saliency is a discriminant process
* BRDF Acquisition with Basis Illumination
* Capacity Scaling for Graph Cuts in Vision
* Chaotic Invariants for Human Action Recognition
* Classification of Weakly-Labeled Data with Partial Equivalence Relations
* ClassMap: Efficient Multiclass Recognition via Embeddings
* Cluster Boosted Tree Classifier for Multi-View, Multi-Pose Object Detection
* Co-Tracking Using Semi-Supervised Support Vector Machines
* Component Based Deformable Model for Generalized Face Alignment, A
* Conditional State Space Models for Discriminative Motion Estimation
* Consistent Correspondence between Arbitrary Manifold Surfaces
* Contextual Distance for Data Perception
* Contour Grouping Based on Local Symmetry
* Convex Optimization for Deformable Surface 3-D Tracking
* Coplanar Shadowgrams for Acquiring Visual Hulls of Intricate Objects
* Correspondence labelling for wide-timeframe free-form surface matching
* Correspondence Transfer for the Registration of Multimodal Images
* COST: An Approach for Camera Selection and Multi-Object Inference Ordering in Dynamic Scenes
* Coupled Detection and Trajectory Estimation for Multi-Object Tracking
* Database and Evaluation Methodology for Optical Flow, A
* Deformable Image Mosaicing for Optical Biopsy
* Deformable Template As Active Basis
* Depth and Appearance for Mobile Scene Analysis
* Depth Information by Stage Classification
* Depth-From-Recognition: Inferring Meta-data by Cognitive Feedback
* Detecting and Localizing 3D Object Classes using Viewpoint Invariant Reference Frames
* Detecting Illumination in Images
* Detection and Tracking of Multiple Humans with Extensive Pose Articulation
* Differential EMD Tracking
* Direct Estimation of Nonrigid Registrations with Image-Based Self-Occlusion Reasoning
* Discrete Differential Operator for Direction-based Surface Morphometry, A
* Discriminant Embedding for Local Image Descriptors
* Discriminative Subsequence Mining for Action Classification
* Divide-and-Conquer Approach to 3D Object Reconstruction from Line Drawings, A
* Dynamic Cascades for Face Detection
* Dynamically consistent optical flow estimation
* DynamicBoost: Boosting Time Series Generated by Dynamical Systems
* Efficient Feature Extraction for Image Classification
* Efficient Generic Calibration Method for General Cameras with Single Centre of Projection
* Efficient Message Representations for Belief Propagation
* Efficient Mining of Frequent and Distinctive Feature Configurations
* Efficient Multi-View Reconstruction of Large-Scale Scenes using Interest Points, Delaunay Triangulation and Graph Cuts
* Efficient Optimization for L-inf-problems using Pseudoconvexity
* Efficient Silhouette Extraction with Dynamic Viewpoint
* Embedded Profile Hidden Markov Models for Shape Analysis
* Empirical Study of Object Category Recognition: Sequential Testing with Generalized Samples, An
* Event Detection in Crowded Videos
* Exploiting Object Hierarchy: Combining Models from Different Category Levels
* Exploiting Occluding Contours for Real-Time 3D Tracking: A Unified Approach
* Extracting Spatiotemporal Interest Points using Global Information
* Extracting Texels in 2.1D Natural Textures
* Eyeblink-based Anti-Spoofing in Face Recognition from a Generic Webcamera
* Fast Automatic Heart Chamber Segmentation from 3D CT Data Using Marginal Space Learning and Steerable Features
* Fast Bilinear SfM with Side Information
* Fast Crowd Segmentation Using Shape Indexing
* Fast Matching of Planar Shapes in Sub-cubic Runtime
* Fast Method to Minimize L-inf Error Norm for Geometric Vision Problems, A
* Fast Pixel/Part Selection with Sparse Eigenvectors
* Fast training and selection of Haar features using statistics in boosting-based face detection
* Feature Preserving Image Smoothing Using a Continuous Mixture of Tensors
* Fitting a Morphable Model to 3D Scans of Faces
* Game-Theoretic Multiple Target Tracking
* General Discriminant Model for Color Face Recognition, A
* Generalized Median Graphs: Theory and Applications
* Geodesic Framework for Fast Interactive Image and Video Segmentation and Matting, A
* Geolocating Static Cameras
* Geometric Integrability and Consistency of 3D Point Clouds
* Global Optimization through Searching Rotation Space and Optimal Estimation of the Essential Matrix
* Globally Optimal Affine and Metric Upgrades in Stratified Autocalibration
* Globally Optimal Algorithm for Robust TV-L1 Range Image Integration, A
* Globally Optimal Image Segmentation with an Elastic Shape Prior
* Gradient Feature Selection for Online Boosting
* Gradient Intensity-Based Registration of Multi-Modal Images of the Brain
* Graph Based Discriminative Learning for Robust and Efficient Object Tracking
* Graph-Cut Transducers for Relevance Feedback in Content Based Image Retrieval
* Half Quadratic Analysis for Mean Shift: With Extension to A Sequential Data Mode-Seeking Method
* Harvesting Image Databases from the Web
* Hierarchical Ensemble of Global and Local Classifiers for Face Recognition
* Hierarchical Model-Based Human Motion Tracking Via Unscented Kalman Filter
* Hierarchical Part-Template Matching for Human Detection and Segmentation
* Hierarchical Semantics of Objects (hSOs)
* High Detection-rate Cascades for Real-Time Object Detection
* High Dynamic Range Camera using Reflective Liquid Crystal
* High-Dimensional Feature Matching: Employing the Concept of Meaningful Nearest Neighbors
* Homographic Framework for the Fusion of Multi-view Silhouettes, A
* How Good are Local Features for Classes of Geometric Objects
* Human Pose Estimation using Motion Exemplars
* Hybrid Graph Model for Unsupervised Object Segmentation, A
* Illumination and Affine-Invariant Point Matching using an Ordinal Approach
* Image Classification using Random Forests and Ferns
* Improving Descriptors for Fast Tree Matching by Optimal Linear Projection
* Improving Numerical Accuracy of Grobner Basis Polynomial Equation Solvers
* Incremental Learning of Boosted Face Detector
* Instability of Projective Reconstruction of Dynamic Scenes near Critical Configurations
* Integrating Appearance and Motion Cues for Simultaneous Detection and Segmentation of Pedestrians
* Interactive Offline Tracking for Color Objects
* Interactive Search for Image Categories by Mental Matching
* Introducing Curvature into Globally Optimal Image Segmentation: Minimum Ratio Cycles on Product Graphs
* Invariant Large Margin Nearest Neighbour Classifier, An
* Joint Affinity Propagation for Multiple View Segmentation
* Joint Feature Tracking and Radiometric Calibration from Auto-Exposure Video
* Joint Manifold Model for Semi-supervised Multi-valued Regression, The
* L-inf to Structure and Motion Problems in 1D-Vision, An
* Laplacian PCA and Its Applications
* Latent Model Clustering and Applications to Visual Recognition
* Layer-Based Restoration Framework for Variable-Aperture Photography, A
* Learn to Track Edges
* Learning 3-D Scene Structure from a Single Still Image
* Learning Auto-Structured Regressor from Uncertain Nonnegative Labels
* Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification
* Learning Graph Matching
* Learning Motion Correlation for Tracking Articulated Human Body with a Rao-Blackwellised Particle Filter
* Learning Multiscale Representations of Natural Scenes Using Dirichlet Processes
* Learning priors for calibrating families of stereo cameras
* Learning Structured Appearance Models from Captioned Images of Cluttered Scenes
* Learning The Discriminative Power-Invariance Trade-Off
* Learning the Taxonomy and Models of Categories Present in Arbitrary Images
* Learning to Find Object Boundaries Using Motion Cues
* Leveraging archival video for building face datasets
* Limits of Learning-Based Superresolution Algorithms
* Locally Invariant Fractal Features for Statistical Texture Classification
* Locally Smooth Metric Learning with Application to Image Retrieval
* LogCut: Efficient Graph Cut Optimization for Markov Random Fields
* Matte-less, Variational Approach to Automatic Scene Compositing, A
* Metric Learning Using Iwasawa Decomposition
* Minimizing the Reprojection Error in Surface Reconstruction from Images
* Mixture-of-Parts Pictorial Structures for Objects with Variable Part Sets
* Mode-seeking by Medoidshifts
* Modeling View and Posture Manifolds for Tracking
* Monocular SLAM as a Graph of Coalesced Observations
* Moving Object Extraction with a Hand-held Camera
* MRF Optimization via Dual Decomposition: Message-Passing Revisited
* Multi-Camera Calibration with One-Dimensional Object under General Motions
* Multi-Image Restoration Method for Image Reconstruction from Projections, A
* Multi-View Stereo for Community Photo Collections
* Multi-View Stereo via Graph Cuts on the Dual of an Adaptive Tetrahedral Mesh
* Multilinear Projection for Appearance-Based Recognition in the Tensor Framework
* Multiscale Edge Detection and Fiber Enhancement Using Differences of Oriented Means
* Multispectral Imaging Using Multiplexed Illumination
* N3M: Natural 3D Markers for Real-Time Object Detection and Pose Estimation
* New Convolution Kernel for Atmospheric Point Spread Function Applied to Computer Vision, A
* No Grouping Left Behind: From Edges to Curve Fragments
* Noise Robust Spectral Clustering
* Non-homogeneous Content-driven Video-retargeting
* Non-Linear Beam Model for Tracking Large Deformations
* Non-metric affinity propagation for unsupervised image categorization
* Non-Parametric Probabilistic Image Segmentation
* Non-Rigid Image Registration using a Hierarchical Partition of Unity Finite Element Method
* Non-rigid Photometric Stereo with Colored Lights
* Nonlinear Discriminative Approach to AAM Fitting, A
* Normalized Cuts Revisited: A Reformulation for Segmentation with Linear Grouping Constraints
* Novel Depth Cues from Uncalibrated Near-field Lighting
* Novel High Breakdown M-estimator for Visual Data Segmentation, A
* Objects in Context
* Omnidirectional Vision Sensor with Single View and Constant Resolution, An
* On Constrained Sparse Matrix Factorization
* On the Differential Geometry of 3D Flow Patterns: Generalized Helicoids and Diffusion MRI Analysis
* On the Extraction of Curve Skeletons using Gradient Vector Flow
* Optimization and Learning for Registration of Moving Dynamic Textures
* Optimizing Image Registration by Mutually Exclusive Scale Components
* Out-of-Core Bundle Adjustment for Large-Scale 3D Reconstruction
* Pairwise Similarities across Images for Multiple View Rigid/Non-Rigid Segmentation and Registration
* Parsing Images of Architectural Scenes
* Penrose Pixels Super-Resolution in the Detector Layout Domain
* People-LDA: Anchoring Topics to People using Face Recognition
* Perspectively Invariant Normal Features
* Phase Based Modelling of Dynamic Textures
* Plane-based self-calibration of radial distortion
* pLSA for Sparse Arrays With Tsallis Pseudo-Additive Divergence: Noise Robustness and Algorithm
* Population Shape Regression from Random Design Data
* PR: More than Meets the Eye
* Probabilistic Color and Adaptive Multi-Feature Tracking with Dynamically Switched Priority Between Cues
* Probabilistic Fusion Tracking Using Mixture Kernel-Based Bayesian Filtering
* Probabilistic Linear Discriminant Analysis for Inferences About Identity
* probabilistic, hierarchical, and discriminant framework for rapid and accurate detection of deformable anatomic structure, A
* Proximity Distribution Kernels for Geometric Context in Category Recognition
* Rank Minimization Approach to Video Inpainting, A
* Real-time Accurate Object Detection using Multiple Resolutions
* Real-time Body Tracking Using a Gaussian Process Latent Variable Model
* Real-Time Marker-free Motion Capture from multiple cameras
* Real-Time SLAM Relocalisation
* Real-Time Visibility-Based Fusion of Depth Maps
* Reconstructing High Quality Face-Surfaces using Model Based Stereo
* Reconstructing the Surface of Inhomogeneous Transparent Scenes by Scatter-Trace Photography
* Recovering Occlusion Boundaries from a Single Image
* Relative Epipolar Motion of Tracked Features for Correspondence in Binocular Stereo
* Removing Non-Uniform Motion Blur from Images
* Restoration Framework for Correcting Photometric and Geometric Distortions in Camera-based Document Images, A
* Retrieving actions in movies
* Ricci Flow for 3D Shape Analysis
* Robust Estimation of Albedo for Illumination-invariant Matching and Shape Recovery
* Robust Graph-Based Method for The General Correspondence Problem Demonstrated on Image Stitching, A
* Robust Object Tracking with Regional Affine Invariant Features
* Robust Structured Light Coding for 3D Reconstruction
* Robust Visual Tracking Based on Incremental Tensor Subspace Learning
* Robust Visual Tracking Using the Time-Reversibility Constraint
* Rock, Paper, and Scissors: Extrinsic vs. intrinsic similarity of non-rigid shapes
* Rotational Motion Deblurring of a Rigid Object from a Single Image
* Scalable Approach to Activity Recognition based on Object Use, A
* Scale-Dependent 3D Geometric Features
* Scale-Invariant Features on the Sphere
* Scene Modeling Using Co-Clustering
* Scene Representation Based on Multi-Modal 2D and 3D Features, A
* Scene Summarization for Online Image Collections
* Seeded Image Segmentation Framework Unifying Graph Cuts And Random Walker Which Yields A New Algorithm, A
* Segmentation using Meta-texture Saliency
* Semi-supervised Discriminant Analysis
* Separating Parts from 2D Shapes using Relatability
* Shape and Appearance Context Modeling
* Shape Descriptors for Maximally Stable Extremal Regions
* Shape from Focus and Defocus: Convexity, Quasiconvexity and Defocus-Invariant Textures
* Shape from Varying Illumination and Viewpoint
* Shape Priors using Manifold Learning Techniques
* Shape Reconstruction Based on Similarity in Radiance Changes under Varying Illumination
* Shining a Light on Human Pose: On Shadows, Shading and the Estimation of Pose and Shape
* Signals on Pencils of Lines
* Simultaneous Learning of Nonlinear Manifold and Dynamical Models for High-dimensional Time Series
* Simultaneous Segmentation and 3D Reconstruction of Monocular Image Sequences
* Spatial Random Partition for Common Visual Pattern Discovery
* Spatially Coherent Latent Topic Model for Concurrent Segmentation and Classification of Objects and Scenes
* Spatio-Temporal Shape from Silhouette using Four-Dimensional Delaunay Meshing
* Spectral Latent Variable Models for Perceptual Inference
* Spectral Regression for Efficient Regularized Subspace Learning
* Specular Highlight Detection Based on the Fresnel Reflection Coefficient
* Spherical-Homoscedastic Shapes
* Stable Affine Frames on Isophotes
* Steerable Random Fields
* Stereo Matching with the Distinctive Similarity Measure
* Stochastic Adaptive Tracking In A Camera Network
* Structure from Motion with Missing Data is NP-Hard
* Structure from Statistics: Unsupervised Activity Analysis using Suffix Trees
* Study of Face Recognition as People Age, A
* Supervised Learning of Image Restoration with Convolutional Networks
* Support Kernel Machines for Object Recognition
* Surface-from-Gradients with Incomplete Data for Single View Modeling
* Symmetry-Based Generative Model for Shape, A
* Synthetic Aperture Tracking: Tracking through Occlusions
* Task Specific Local Region Matching
* Temporal Segmentation of Facial Behavior
* Temporally Consistent Reconstruction from Multiple Video Streams Using Enhanced Belief Propagation
* Ten-fold Improvement in Visual Odometry Using Landmark Matching
* Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval
* Toward a Theory of Shape from Specular Flow
* Toward Reconstructing Surfaces With Arbitrary Isotropic Reflectance: A Stratified Photometric Stereo Approach
* Trajectory Rectification and Path Modeling for Video Surveillance
* Two-View Motion Segmentation by Mixtures of Dirichlet Process with Model Selection and Outlier Removal
* Uninitialized, Globally Optimal, Graph-Based Rectilinear Shape Segmentation: The Opposing Metrics Method
* Unsupervised Image Categorization and Object Localization using Topic Models and Correspondences between Images
* Unsupervised Joint Alignment of Complex Images
* Unsupervised Learning of Object Deformation Models
* Untangling Cycles for Contour Grouping
* Using Color Compatibility for Assessing Image Realism
* Using High-Level Visual Information for Color Constancy
* USSR: A Unified Framework for Simultaneous Smoothing, Segmentation, and Registration of Multiple Images
* Variable Dimensional Local Shape Descriptors for Object Recognition in Range Data
* Variational Framework for Simultaneous Motion Estimation and Restoration of Motion-Blurred Video, A
* Variational Method for Scene Flow Estimation from Stereo Sequences, A
* Variational optimal control technique for the tracking of deformable objects
* Variational Particle Filter for Multi-Object Tracking
* Variational Segmentation using Fuzzy Region Competition and Local Non-Parametric Probability Density Functions
* Variational Stereo Vision with Sharp Discontinuities and Occlusion Handling
* Vector Quantizing Feature Space with a Regular Lattice
* Video-based Face Recognition on Real-World Data
* Visual Tracking by Affine Kernel Fitting Using Color and Object Boundary
* von Kries Hypothesis and a Basis for Color Constancy, The
* Webcam Synopsis: Peeking Around the World
* What Can Casual Walkers Tell Us About A 3D Scene?
* What, where and who? Classifying events by scene and object recognition
* When is a Discrete Diffusion a Scale-Space?
290 for ICCV07

* 3D Open-surface Shape Correspondence for Statistical Shape Modeling: Identifying Topologically Consistent Landmarks
* 3D reconstruction from image collections with a single known focal length
* Absolute scale in structure from motion from a single vehicle mounted camera by exploiting nonholonomic constraints
* Action detection in complex scenes with spatial and temporal ambiguities
* Actionable information in vision
* Active Appearance Models with Rotation Invariant Kernels
* Active Segmentation with Fixation
* Active skeleton for non-rigid object detection
* Active subspace learning
* Activity recognition using the velocity histories of tracked keypoints
* Adaptive fragments-based tracking of non-rigid objects using level sets
* algebraic approach to affine registration of point sets, An
* algebraic model for fast corner detection, An
* algorithm for minimizing the Mumford-Shah functional, An
* Analysis of Orientation and Scale in Smoothly Varying Textures
* Associative hierarchical CRFs for object class image segmentation
* Attached shadow coding: Estimating surface normals from shadows under unknown reflectance and lighting conditions
* Attribute and simile classifiers for face verification
* Automatic Annotation of Human Actions in Video
* Automatic learning and extraction of multi-local features
* Automatic ovarian follicle quantification from 3D ultrasound data using global/local context with database guided segmentation
* Background Subtraction for Freely Moving Cameras
* Bayesian Poisson regression for crowd counting
* Bayesian selection of scaling laws for motion modeling in images
* Beyond connecting the dots: A polynomial-time algorithm for segmentation and boundary estimation with imprecise user input
* Beyond the Euclidean distance: Creating effective visual codebooks using the Histogram Intersection Kernel
* biased sampling strategy for object categorization, A
* BLOGS: Balanced local and global search for non-degenerate two view epipolar geometry
* Body-relative Navigation Guidance Using Uncalibrated Cameras
* Boundary ownership by lifting to 2.1D
* branch-and-bound algorithm for globally optimal calibration of a camera-and-rotation-sensor system, A
* Building recognition using sketch-based representations and spectral graph matching
* Building Rome in a day
* Class Segmentation and Object Localization with Superpixel Neighborhoods
* Coarse registration of 3D surface triangulations based on moment invariants with applications to object alignment and identification
* Coded aperture pairs for depth from defocus
* Color constancy using 3D scene geometry
* Combining efficient object localization and image classification
* Compact signatures for high-speed interest point description and matching
* Complete multi-view reconstruction of dynamic scenes from probabilistic fusion of narrow and wide baseline stereo
* Complex Volume and Pose Tracking with Probabilistic Dynamical Models and Visual Hull Constraints
* Component Analysis Approach to Estimation of Tissue Intensity Distributions of 3D Images
* Computation complexity of branch-and-bound model selection
* Consensus set maximization with guaranteed global optimality for robust geometry estimation
* Constrained clustering by spectral kernel learning
* Constructing implicit 3D shape models for pose estimation
* Context by region ancestry
* Convex multi-region segmentation on manifolds
* Convex optimization for multi-class image labeling with a novel family of total variation based regularizers
* Correlated Probabilistic Trajectories for Pedestrian Motion Detection
* Curvature regularity for region-based image segmentation and inpainting: A linear programming relaxation
* Decomposing a scene into geometric and semantically consistent regions
* Deformable model fitting with a mixture of local experts
* Deformation invariant image matching by spectrally controlled diffeomorphic alignment
* Dense 3D Reconstruction Method Using a Single Pattern for Fast Moving Object
* Detecting interpretable and accurate scale-invariant keypoints
* Detecting objects in large image collections and videos by efficient subimage retrieval
* Detection and removal of chromatic moving shadows in surveillance scenarios
* Detection Driven Adaptive Multi-cue Integration for Multiple Human Tracking
* Detection of human actions from a single example
* Diagram techniques for multiple view geometry
* dimensionality of scene appearance, The
* Dimensionality reduction and principal surfaces via Kernel Map Manifolds
* direct approach for efficiently tracking with 3D morphable models, A
* Directional statistics BRDF model
* Discriminative generalized Hough transform for object dectection
* Discriminative Models for Multi-Class Object Layout
* Display-camera calibration from eye reflections
* Domain adaptive semantic diffusion for large scale context-based video annotation
* efficient algorithm for Co-segmentation, An
* Efficient discriminative learning of parts-based models
* Efficient discriminative local learning for object recognition
* Efficient human pose estimation via parsing a tree structure based human model
* Efficient indexing for large scale visual search
* Efficient multi-label ranking for multi-class learning: Application to object recognition
* Efficient privacy preserving video surveillance
* Efficient segmentation using feature-based graph partitioning active contours
* Efficient subset selection via the kernelized Rényi distance
* Efficient, High-quality Image Contour Detection
* Estimating Contact Dynamics
* Estimating Human Shape and Pose from a Single Image
* Estimating natural illumination from a single outdoor image
* Evaluating information contributions of bottom-up and top-down processes
* Exploiting uncertainty in random sample consensus
* Extending continuous cuts: Anisotropic metrics and expansion moves
* Face alignment through subspace constrained mean-shifts
* Face recognition with contiguous occlusion using markov random fields
* Factorizing Scene Albedo and Depth from a Single Foggy Image
* Fast and robust Earth Mover's Distances
* Fast Ray features for learning irregular shapes
* Fast realistic multi-action recognition using mined dense spatio-temporal features
* Fast Visibility Restoration from a Single Color or Gray Level Image
* Feature correspondence and deformable object matching via agglomerative correspondence clustering
* Feature-centric Efficient Subwindow Search
* Filter Flow
* Finding good composition in panoramic scenes
* Finding shareable informative patterns and optimal coding matrix for multiclass boosting
* FLoSS: Facility location for subspace segmentation
* framework for visual saliency detection with applications to image thumbnailing, A
* global perspective on MAP inference for low-level vision, A
* Globally optimal affine epipolar geometry from apparent contours
* Globally optimal segmentation of multi-region objects
* Gradient domain layer separation under independent motion
* Graph cuts using a Riemannian metric induced by tensor voting
* Ground truth dataset and baseline evaluations for intrinsic image algorithms
* Group-sensitive multiple kernel learning for object categorization
* GroupSAC: Efficient consensus in the presence of groupings
* hand-held photometric stereo camera for 3-D modeling, A
* Heterogeneous feature machines for visual recognition
* Hierarchical 3D diffusion wavelet shape priors
* Hierarchical Gaussianization for image classification
* Hierarchical learning for tubular structure parsing in medical imaging: A study on coronary arteries using 3D CT Angiography
* High-resolution shape reconstruction from multiple range images based on simultaneous estimation of surface and motion
* Higher-order gradient descent by fusion-move graph cut
* HOG-LBP Human Detector with Partial Occlusion Handling, An
* Human Detection Using Partial Least Squares Analysis
* Human Pose Estimation Using Consistent Max Covering
* hybrid generative/discriminative classification framework based on free-energy terms, A
* I Know What You Did Last Summer: Object-level Auto-annotation of Holiday Snaps
* Illumination Aware MCMC Particle Filter for Long-term Outdoor Multi-object Simultaneous Tracking and Classification
* Image annotation using multi-label correlated Green's function
* Image compression with anisotropic triangulations
* Image restoration using online photo collections
* Image Saliency by Isocentric Curvedness and Color
* Image segmentation with a bounding box prior
* Image segmentation with simultaneous illumination and reflectance estimation: An energy minimization approach
* Image sequence geolocation with human travel priors
* Implicit color segmentation features for pedestrian and object detection
* Improving accuracy of geometric parameter estimation using projected score method
* Incremental action recognition using feature-tree
* Incremental discriminative-analysis of canonical correlations for action recognition
* Incremental Multiple Kernel Learning for Object Recognition
* infinite Hidden Markov random field model, The
* information theoretic approach for tracker performance evaluation, An
* Is a detector only good for detection?
* Is Dual Linear Self-calibration Artificially Ambiguous?
* Is that you? Metric learning approaches for face identification
* Joint learning of visual attributes, object classes and visual saliency
* Joint optimization of segmentation and appearance models
* Joint Pose Estimator and Feature Learning for Object Detection
* Jointly Estimating Demographics and Height with a Calibrated Camera
* Kernel active contour
* Kernel map compression using generalized radial basis functions
* Kernel methods for weakly supervised mean shift clustering
* Kernelized locality-sensitive hashing for scalable image search
* Keyframe-based Real-time Camera Tracking
* Label set perturbation for MRF based neuroimaging segmentation
* LabelMe video: Building a video database with human annotations
* Landmark classification in large-scale image collections
* Landmark-based sparse color representations for color transfer
* Large displacement optical flow computation without warping
* Large-scale privacy protection in Google Street View
* latent model of discriminative aspect, A
* Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories
* Learning actions from the Web
* Learning based digital matting
* Learning deformable action templates from cluttered videos
* Learning image similarity from Flickr groups using Stochastic Intersection Kernel Machines
* Learning long term face aging patterns from partially dense aging databases
* Learning pedestrian dynamics from the real world
* Learning to predict where humans look
* Learning with dynamic group sparsity
* Least-squares congealing for large numbers of images
* Level set segmentation with both shape and intensity priors
* Light Field Video Stabilization
* linear formulation of shape from specular flow, A
* LIVEcut: Learning-based Interactive Video Segmentation by Evaluation of Multiple Propagated Cues
* Local Distance Functions: A Taxonomy, New Algorithms, and an Evaluation
* Local Trinary Patterns for human action recognition
* Looking Around the Corner Using Transient Imaging
* Markov Clustering Topic Model for Mining Behaviour in Video, A
* Matching as a non-cooperative game
* Max-margin additive classifiers for detection
* Minimizing energy functions on 4-connected lattices using elimination
* Mode-detection via median-shift
* Modeling 3D Human Poses from Uncalibrated Monocular Images
* Modeling deformable objects from a single depth camera
* Modelling Activity Global Temporal Dependencies Using Time Delayed Probabilistic Graphical Model
* Moving in stereo: Efficient structure and motion using lines
* multi-sample, multi-tree approach to bag-of-words image representation for image retrieval, A
* Multi-scale object detection by clustering lines
* Multimodal Partial Estimates Fusion
* Multiperspective stereo matching and volumetric reconstruction
* Multiple kernels for object detection
* Multiple view semantic segmentation for street view images
* Multiscale Symmetric Part Detection and Grouping
* near optimal acceptance-rejection algorithm for exact cross-correlation search, A
* new minimal solution to the relative pose of a calibrated stereo camera with small field of view overlap, A
* new multiview spacetime-consistent depth recovery framework for free viewpoint video rendering, A
* Non-Euclidean image-adaptive Radial Basis Functions for 3D interactive segmentation
* Non-iterative approach for fast and accurate vanishing point detection
* Non-local sparse models for image restoration
* Non-negative matrix factorization of partial track data for motion segmentation
* Non-rigid Object Localization and Segmentation Using Eigenspace Representation
* Normalized Subspace Inclusion: Robust clustering of motion subspaces, The
* novel approach to expression recognition from non-frontal face images, A
* On Feature Combination for Multiclass Object Classification
* On optimizing subspaces for face recognition
* One-Shot similarity kernel, The
* Optical flow estimation on coarse-to-fine region-trees using discrete optimization
* Optimal correspondences from pairwise constraints
* Optimal multiple surfaces searching for video/image resizing: A graph-theoretic approach
* Optimizing parametric total variation models
* Packing bag-of-features
* Patch-based within-object classification
* Piecewise Planar Stereo for Image-based Rendering
* Piecewise-consistent color mappings of images acquired under various conditions
* Plane-based calibration of central catadioptric cameras
* Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations
* Power watersheds: A new image segmentation framework extending graph cuts, random walker and optimal spanning forest
* Prism-based System for Multispectral Video Acquisition, A
* Probabilistic Framework for Partial Intrinsic Symmetries in Geometric Data, A
* Probabilistic occlusion boundary detection on spatio-temporal lattices
* Quantifying contextual information for object detection
* Quasi-periodic Event Analysis for Social Game Retrieval
* Radiometric compensation using stratified inverses
* RankBoost with L1 regularization for facial expression recognition and intensity estimation
* Real-time Visual Tracking via Incremental Covariance Tensor Learning
* Realtime background subtraction from dynamic scenes
* Recognizing Actions by Shape-motion Prototype Trees
* Reconstructing 3D Motion Trajectories of Particle Swarms by Global Correspondence Selection
* Reconstructing Building Interiors from Images
* Recovering planar homographies between 2D shapes
* Recovering the spatial layout of cluttered rooms
* Resilient Subclass Discriminant Analysis
* Riemannian analysis of 3D nose shapes for partial human biometrics, A
* Riemannian Bayesian estimation of diffusion tensor images
* robust boosting tracker with minimum error bound in a co-training framework, A
* Robust dynamical model for simultaneous registration and segmentation in a variational framework: A Bayesian approach
* robust elastic and partial matching metric for face recognition, A
* Robust Facial Feature Tracking Using Selected Multi-resolution Linear Predictors
* Robust fitting of multiple structures: The statistical learning approach
* Robust graph-cut scene segmentation and reconstruction for free-viewpoint video of complex dynamic scenes
* Robust image segmentation using learned priors
* Robust matching of building facades under large viewpoint changes
* Robust Motion Estimation Using Trajectory Spectrum Learning: Application to Aortic and Mitral Valve Modeling from 4D TEE
* Robust multilinear principal component analysis
* Robust tracking-by-detection using a detector confidence particle filter
* Robust Visual Tracking Using L1 Minimization
* Saliency driven total variation segmentation
* Scale invariance and noise in natural images
* Scene shape priors for superpixel segmentation
* SCRAMSAC: Improving RANSAC's efficiency with a spatial consistency filter
* Seeing 3D Objects in a Single 2D Image
* Seeing through Water: Image Restoration Using Model-based Tracking
* Segmentation, Ordering and Multi-object Tracking Using Graphical Models
* Selection and context for action recognition
* Self-Aware Matching Measure for stereo, The
* Semi-automatic stereo extraction from video footage
* Semi-Supervised Random Forests
* Shadow cameras: Reciprocal views from illumination masks
* Shape analysis with multivariate tensor-based morphometry and holomorphic differentials
* Shape guided contour grouping with particle filters
* shape-based object class model for knowledge transfer, A
* Shape-based recognition of 3D point clouds in urban environments
* Shift-map image editing
* Similarity metrics for categorization: From monolithic to category specific
* Simultaneous alignment and clustering for an image ensemble
* Simultaneous and orthogonal decomposition of data using Multimodal Discriminant Analysis
* Simultaneous Camera Pose and Correspondence Estimation in Cornerless Images
* Simultaneous color consistency and depth map estimation for radiometrically varying stereo images
* Simultaneous photometric invariance and shape recovery
* Single view reconstruction using shape grammars for urban environments
* Sparse representation of cast shadows via L1-regularized least squares
* Sparsity induced similarity measure for label propagation
* Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities
* Spectral clustering of linear subspaces for motion segmentation
* Spectral error correcting output codes for efficient multiclass recognition
* Stabilizing Motion Tracking Using Retrieved Motion Priors
* Static multi-camera factorization using rigid motion
* Stereo from flickering caustics
* Storyboard Sketches for Content Based Video Retrieval
* Structural SVM for visual localization and continuous state estimation
* Structure and kinematics triangulation with a rolling shutter stereo rig
* Structure- and motion-adaptive regularization for high accuracy optic flow
* study on automatic age estimation using a large database, A
* Studying brain morphometry using conformal equivalence class
* Subspace matching: Unique solution to point matching with geometric constraints
* Super-resolution from a single image
* Superresolution texture maps for multiview reconstruction
* SURF Tracking
* Swap and Expansion moves revisited and fused, The
* TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation
* Template-free Monocular Reconstruction of Deformable Surfaces
* Tensor Completion for Estimating Missing Values in Visual Data
* Texel-based texture segmentation
* theory of active object localization, A
* Time series prediction by chaotic modeling of nonlinear dynamical systems
* Time-constrained Photography
* Top-down color attention for object recognition
* Tracking a hand manipulating an object
* Tracking a Large Number of Objects from Multiple Views
* Tracking in unstructured crowded scenes
* Unlabeled data improves word prediction
* Unsupervised face alignment by robust nonrigid mapping
* Unsupervised learning of high-order structural semantics from images
* Untangling fibers by quotient appearance manifold mapping for grayscale shape classification
* Using individuality to track individuals: Clustering individual trajectories in crowds using local appearance and frequency trait
* Video object segmentation by tracking regions
* Video scene categorization by 3D hierarchical histogram matching
* Video Scene Understanding Using Multi-scale Analysis
* Video Stabilization Using Robust Feature Trajectories
* Weakly supervised discriminative localization and classification: A joint learning process
* Weighted graph characteristics from oriented line graph polynomials
* What is the best multi-stage architecture for object recognition?
* Which Faces to Tag: Adding Prior Constraints into Active Learning
* Wide-baseline image matching using Line Signatures
* You'll never walk alone: Modeling social behavior for multi-target tracking
309 for ICCV09

* 2D-3D fusion for layer decomposition of urban facades
* 3D Laplacian-driven parametric deformable model, A
* 3D reconstruction of a smooth articulated trajectory from a monocular image sequence
* 3D scene flow estimation with a rigid motion prior
* Accurate 3D pose estimation from a single depth image
* Action recognition in cluttered dynamic scenes using Pose-Specific Part Models
* Action recognition in videos acquired by a moving camera using motion decomposition of Lagrangian particle trajectories
* Action recognition using rank-1 approximation of Joint Self-Similarity Volume
* Active clustering of document fragments using information derived from both images and catalogs
* Active geodesics: Region-based active contour segmentation with a global edge-based constraint
* Active scene recognition with vision and language
* Actively selecting annotations among objects and attributes
* adaptive coupled-layer visual model for robust visual tracking, An
* Adaptive deconvolutional networks for mid and high level feature learning
* Adversarial Optimization Approach to Efficient Outlier Removal, An
* Aerial 3D reconstruction with line-constrained dynamic programming
* Annotator rationales for visual recognition
* Are spatial and global constraints really necessary for segmentation?
* Articulated part-based model for joint object detection and pose estimation
* Ask the locals: Multi-way local pooling for image recognition
* Assessing the aesthetic quality of photographs using generic image descriptors
* Automated articulated structure and 3D shape recovery from point correspondences
* Automated corpus callosum extraction via Laplace-Beltrami nodal parcellation and intrinsic geodesic curvature flows on surfaces
* automatic assembly and completion framework for fragmented skulls, An
* Automatic construction of an action video shot database using web videos
* Automatic salient object extraction with contextual cue
* Basis constrained 3D scene flow on a dynamic proxy
* BiCoS: A Bi-level co-segmentation method for image classification
* Birdlets: Subordinate categorization using volumetric primitives and pose-normalized appearance
* Blurred target tracking by Blur-driven Tracker
* Blurring-invariant Riemannian metrics for comparing signals and images
* BRISK: Binary Robust invariant scalable keypoints
* Building a better probabilistic model of images by factorization
* Building large urban environments from unstructured point data
* CARD: Compact And Real-time Descriptors
* Center-surround divergence of feature statistics for salient object detection
* Centralized sparse representation for image restoration
* chains model for localizing participants of group activities in videos, A
* Close the loop: Joint blind image restoration and recognition with sparse representation prior
* Cluster-based color space optimizations
* Coherency Sensitive Hashing
* Color photometric stereo for multicolored surfaces
* Compact correlation coding for visual object categorization
* Complementary hashing for approximate nearest neighbor search
* Conditional Random Fields for multi-camera object detection
* Content-Based Photo Quality Assessment
* Contextual Weighting for Vocabulary Tree Based Image Retrieval
* Contour Code: Robust and efficient multispectral palmprint encoding for human recognition
* convex framework for image segmentation with moment constraints, A
* Convex multi-region probabilistic segmentation with shape prior in the isometric log-ratio transformation space
* Correlative multi-label multi-instance image annotation
* Correspondence free registration through a point-to-model distance minimization
* data-driven approach for real-time full body pose reconstruction from a depth camera, A
* Data-driven crowd analysis in videos
* Decision tree fields
* Decoupling photometry and geometry in dense variational camera calibration
* Delta-Dual Hierarchical Dirichlet Processes: A pragmatic abnormal behaviour detector
* Dense disparity maps from sparse disparity measurements
* Dense one-shot 3D reconstruction by detecting continuous regions with parallel line projection
* Density-aware person detection and tracking in crowds
* Describing people: A poselet-based approach to attribute classification
* Detailed reconstruction of 3D plant root shape
* Diagonal preconditioning for first order primal-dual algorithms in convex optimization
* Diffuse reflectance imaging with astronomical applications
* Diffusion runs low on persistence fast
* Digital anti-aging in face images
* dimensionality result for multiple homography matrices, A
* Direct Least-Squares (DLS) method for PnP, A
* Discovering favorite views of popular places with iconoid shift
* Discovering object instances from scenes of Daily Living
* Discriminative figure-centric models for joint action localization and recognition
* Discriminative high order SVD: Adaptive tensor subspace selection for image classification, clustering, and retrieval
* Discriminative learning of relaxed hierarchy for large-scale visual recognition
* Discriminative Multimanifold Analysis for Face Recognition from a Single Training Sample per Person
* Distributed cosegmentation via submodular optimization on anisotropic diffusion
* Domain adaptation for object recognition: An unsupervised approach
* Double window optimisation for constant time visual SLAM
* DTAM: Dense tracking and mapping in real-time
* Dyadic transfer learning for cross-domain image classification
* Dynamic and hierarchical multi-structure geometric model fitting
* Dynamic fluid surface acquisition using a camera array
* Dynamic Manifold Warping for view invariant action recognition
* Dynamic subspace-based coordinated multicamera tracking
* Dynamic texture classification using dynamic fractal analysis
* Edge foci interest points
* Efficient algorithm for low-rank matrix factorization with missing components and performance comparison of latest algorithms
* Efficient learning of sparse, distributed, convolutional feature representations for object recognition
* Efficient Orthogonal Matching Pursuit using sparse random projections for scene and video classification
* Efficient parallel message computation for MAP inference
* Efficient Regression of General-Activity Human Poses from Depth Images
* Efficient similarity search for covariance matrices via the Jensen-Bregman LogDet Divergence
* End-to-end scene text recognition
* Ensemble of exemplar-SVMs for object detection and beyond
* Evaluation of image features using a photorealistic virtual world
* Exemplar extraction using spatio-temporal hierarchical agglomerative clustering for face recognition in video
* Exploiting the Manhattan-world assumption for extrinsic self-calibration of multi-modal sensor networks
* Exploring regularized feature selection for person specific face verification
* Extracting adaptive contextual cues from unlabeled regions
* Extracting foreground masks towards object recognition
* Face recognition based on non-corresponding region matching
* Face recognition via local sparse coding
* Face reconstruction in the wild
* FACS valid 3D dynamic action unit database with applications to 3D dynamic morphable facial modeling, A
* Fast articulated motion tracking using a sums of Gaussians body model
* Fast image-based localization using direct 2D-to-3D matching
* Fast removal of non-uniform camera shake
* Fast template matching in non-linear tone-mapped images
* Feature seeding for action recognition
* Fisher Discrimination Dictionary Learning for sparse representation
* Fourier Active Appearance Models
* From contours to 3D object detection and pose estimation
* From images to scenes: Compressing an image cluster into a single scene model for place recognition
* From learning models of natural image patches to whole image restoration
* Full DOF tracking of a hand interacting with an object by modeling occlusions and physical constraints
* Fully automatic pose-invariant face recognition via 3D pose normalization
* Fusing generic objectness and visual saliency for salient object detection
* Fusing visual and range imaging for object class recognition
* Gaussian process regression flow for analysis of motion trajectories
* general preconditioning scheme for difference measures in deformable registration, A
* Generalized background subtraction based on hybrid inference by belief propagation and Bayesian filtering
* Generalized ordering constraints for multilabel optimization
* Generalized roof duality for pseudo-boolean optimization
* Generalized Subgraph Preconditioners for Large-Scale Bundle Adjustment
* generalized trace-norm and its application to structure-from-motion problems, The
* geometric solver for calibrated stereo egomotion, A
* Geometrically consistent elastic matching of 3D shapes: A linear programming solution
* Geometrically consistent stereo seam carving
* Globally optimal solution to multi-object tracking with merged measurements
* Gradient-based learning of higher-order image features
* graph cut algorithm for higher-order Markov Random Fields, A
* Graph mode-based contextual kernels for robust SVM tracking
* graph-matching kernel for object categorization, A
* Handling Label Noise in Video Classification via Multiple Instance Learning
* Handling outliers in non-blind image deconvolution
* HEAT: Iterative relevance feedback with one million images
* High quality depth map upsampling for 3D-TOF cameras
* High quality image reconstruction from RAW and JPEG image pair
* HMDB: A large video database for human motion recognition
* Home 3D body scans from noisy image and range data
* Hough-based tracking of non-rigid objects
* Human action recognition by learning bases of action attributes and parts
* Human activity prediction: Early recognition of ongoing activities from streaming videos
* iGroup: Weakly supervised image and video grouping
* Illumination demultiplexing from a single image
* Image based detection of geometric changes in urban environments
* Image representation by active curves
* Image segmentation by figure-ground composition into maximal cliques
* Imaging via three-dimensional compressive sampling (3DCS)
* In defense of soft-assignment coding
* Incremental on-line semi-supervised learning for segmenting the left ventricle of the heart from ultrasound data
* Inferring human gaze from appearance via adaptive linear regression
* Inferring social relations from visual concepts
* Informative feature selection for object recognition via Sparse PCA
* Integrating local classifiers through nonlinear dynamics on label graphs with an application to image segmentation
* Introducing total curvature for image processing
* Isotonic CCA for sequence alignment and activity recognition
* joint learning framework for attribute models and object descriptions, A
* Kernel non-rigid structure from motion
* Key-segments for video object segmentation
* Kinecting the dots: Particle based scene flow from depth sensors
* Large-scale image annotation using visual synset
* Latent Low-Rank Representation for subspace segmentation and feature extraction
* Latent structured models for human pose estimation
* Learning a category independent object detection cascade
* Learning a mixture of sparse distance metrics for classification and dimensionality reduction
* Learning component-level sparse representation using histogram information for image classification
* Learning cross-modality similarity for multinomial data
* Learning equivariant structured output SVM regressors
* Learning nonlinear distance functions using neural network for regression with application to robust human age estimation
* Learning occlusion with likelihoods for visual tracking
* Learning parameterized histogram kernels on the simplex manifold for image and action classification
* Learning spatiotemporal graphs of human activities
* Learning specific-class segmentation from diverse data
* Learning to cluster using high order graphical models with latent variables
* Learning to predict the perceived visual quality of photos
* Learning universal multi-view age estimator using video context
* Level-set person segmentation and tracking with multi-region appearance models and top-down shape information
* Linear dependency modeling for feature fusion
* Linear stereo matching
* linear subspace learning approach via sparse coding, A
* Linear time offline tracking and lower envelope algorithms
* Local Intensity Order Pattern for feature description
* Localized principal component analysis based curve evolution: A divide and conquer approach
* Locally rigid globally non-rigid surface registration
* Low order dynamics embedding for high dimensional time series
* Manhattan scene understanding using monocular, stereo, and 3D features
* Markov Random Field-based fitting of a subdivision-based geometric atlas
* Material-specific user colour profiles from imaging spectroscopy data
* Maximizing all margins: Pushing face recognition with Kernel Plurality
* Means in spaces of tree-like shapes
* medial feature detector: Stable regions from image boundaries, The
* Minimum near-convex decomposition for robust shape representation
* Modeling image similarity by Gaussian mixture models and the Signature Quadratic Form Distance
* Modeling spatial layout with Fisher vectors for image categorization
* Modeling temporal coherence for optical flow
* Multi-class semi-supervised SVMs with Positiveness Exclusive Regularization
* Multi-hypothesis motion planning for visual object tracking
* Multi-label visual classification with label exclusive context
* Multi-observation visual recognition via joint dynamic sparse representation
* Multi-task low-rank affinity pursuit for image segmentation
* Multi-view 3D reconstruction for scenes under the refractive plane with known vertical direction
* Multi-view repetitive structure detection
* Multiclass recognition and part localization with humans in the loop
* Multiclass transfer learning from unconstrained priors
* Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes
* Multiplexed illumination for scene recovery in the presence of global illumination
* Multiscale, curvature-based shape representation for surfaces
* Multiview 3D warps
* Multiview structure from motion in trajectory space
* N-best maximal decoders for part models
* NBNN kernel, The
* new distance for scale-invariant 3D shape recognition and registration, A
* Non-stationary correction of optical aberrations
* nonparametric Riemannian framework on tensor field with application to foreground segmentation, A
* Object detection and segmentation from joint embedding of parts and pixels
* Object recoloring based on intrinsic image estimation
* Object segmentation in video: A hierarchical variational approach for turning point trajectories into dense regions
* On the repeatability of the local reference frame for partial shape matching
* Optical flow estimation using learned sparse model
* Optimal estimation of vanishing points in a Manhattan world
* Optimal landmark detection using shape models and branch and bound
* Optimal object matching via convexification and composition
* Optimizing polynomial solvers for minimal geometry problems
* ORB: An efficient alternative to SIFT or SURF
* Outdoor human motion capture using inverse kinematics and von mises-fisher sampling
* Panoramic stereo video textures
* Parallelizable inpainting and refinement of diffeomorphisms using Beltrami holomorphic flow
* Parsing video events with goal inference and intent prediction
* Perturb-and-MAP random fields: Using discrete optimization to learn and sample from energy models
* Physically-based motion models for 3D tracking: A convex formulation
* Point-based calibration using a parametric representation of the general imaging model
* Pose estimation from reflections for specular surface recovery
* Pose, illumination and expression invariant pairwise face-similarity measure via Doppelgänger list comparison
* Positive definite dictionary learning for region covariances
* power of comparative reasoning, The
* Predicting occupation via human clothing and contexts
* Probabilistic 3D object recognition with both positive and negative evidences
* Probabilistic group-level motion analysis and scenario recognition
* Probabilistic image segmentation with closedness constraints
* Pushing the limits of digital imaging using structured illumination
* Random ensemble metrics for object recognition
* Real-time indoor scene understanding using Bayesian filtering with motion cues
* Realtime multibody visual SLAM with a smoothly moving monocular camera
* Recognising spontaneous facial micro-expressions
* Recognizing jumbled images: The role of local and global information in image classification
* RECON: Scale-adaptive robust estimation via Residual Consensus
* Recursive MDL via graph cuts: Application to segmentation
* Refractive shape from light field distortion
* Regression from local features for viewpoint and pose estimation
* Relative attributes
* revisit to cost aggregation in stereo matching: How far can we reduce its computational redundancy?, A
* Revisiting radiometric calibration for color computer vision
* Robust and efficient parametric face alignment
* Robust consistent correspondence between 3D non-rigid shapes based on Dual Shape-DNA
* Robust object pose estimation via statistical manifold modeling
* robust pipeline for rapid feature-based pre-alignment of dense range scans, A
* Robust topological features for deformation invariant image matching
* Robust unsupervised motion pattern inference from video and applications
* Salient object detection by composition
* Salient Object Detection using concavity context
* Scalable object-class retrieval with approximate and top-k ranking
* Scale and object aware image retargeting for thumbnail browsing
* Scale space for central catadioptric systems: Towards a generic camera feature extractor
* Scan rectification for structured light range sensors with rolling shutters
* Scene recognition and weakly supervised object localization with deformable part-based models
* Segmentation as selective search for object recognition
* Segmentation from a box
* Segmentation fusion for connectomics
* selective spatio-temporal interest point detector for human action recognition in complex scenes, A
* Self-calibrating depth from refraction
* Semantic contours from inverse detectors
* Semi-supervised learning and optimization for hypergraph matching
* Shading-Based Dynamic Shape Refinement from Multi-View Video under General Illumination
* Shape-Constrained Gaussian Process Regression for Facial-Point-Based Head-Pose Normalization
* Shared shape spaces
* Silhouette-based object phenotype recognition using 3D shape priors
* Similarity invariant classification of events by KL divergence minimization
* Simplification of 3D morphable models
* Simultaneous correspondence and non-rigid 3D reconstruction of the coronary tree from single X-ray images
* Simultaneous localization, mapping and deblurring
* Simultaneous multi-body stereo and segmentation
* Single-shot high dynamic range imaging with conventional camera hardware
* Slow feature analysis and decorrelation filtering for separating correlated sources
* Smooth object retrieval using a bag of boundaries
* Sorted Random Projections for robust texture classification
* Source camera identification using Auto-White Balance approximation
* Source constrained clustering
* Sparse dictionary-based representation and recognition of action attributes
* Sparse multi-task regression and feature selection to identify brain imaging predictors for memory performance
* Sparse representation or collaborative representation: Which helps face recognition?
* Spatial pyramid co-occurrence for image classification
* Spatio-temporal clustering of probabilistic region trajectories
* Spatiotemporal oriented energies for spacetime stereo
* Spectral learning of latent semantics for action recognition
* Speeded-up, relaxed spatial matching
* Stereo reconstruction using high order likelihood
* Stereo time-of-flight
* StereoCut: Consistent interactive object selection in stereo image pairs
* string of feature graphs model for recognition of complex activities in natural videos, A
* Strong supervision from weak annotation: Interactive training of deformable part models
* Struck: Structured Output Tracking with Kernels
* Structure-Sensitive Superpixels via Geodesic Distance
* Structured class-labels in random forests for semantic image labelling
* Superpixel tracking
* Superpixels via pseudo-Boolean optimization
* Tabula rasa: Model transfer for object category detection
* Tasting families of features for image classification
* Temporally coded flash illumination for motion deblurring
* Text-based image retrieval using progressive multi-instance learning
* theory of Coprime Blurred Pairs, A
* Tight convex relaxations for vector-valued labeling problems
* Towards accurate and efficient representation of image irradiance of convex-Lambertian objects under unknown near lighting
* Tracking by Sampling Trackers
* Tracking multiple people under global appearance constraints
* Trajectory reconstruction from non-overlapping surveillance cameras with relative depth ordering constraints
* Treat samples differently: Object tracking with semi-supervised online CovBoost
* truth about cats and dogs, The
* Understanding egocentric activities
* Understanding scenes on many levels
* Unstructured light scanning to overcome interreflections
* Unsupervised and semi-supervised learning via l1-norm graph
* Unsupervised learning of a scene-specific coarse gaze estimator
* Unsupervised learning of event AND-OR grammar and semantics from video
* Unsupervised metric learning by Self-Smoothing Operator
* Unsupervised metric learning for face identification in TV video
* Unwrapping low-rank textures on generalized cylindrical surfaces
* Variational Recursive Joint Estimation of Dense Scene Structure and Camera Motion from Monocular High Speed Traffic Sequences
* Variational stereo in dynamic illumination
* Video from a single coded exposure photograph using a learned over-complete dictionary
* Video parsing for abnormality detection
* Video Primal Sketch: A generic middle-level representation of video
* Viewpoint invariant 3D landmark model inference from monocular 2D images using higher-order priors
* Viewpoint-aware object detection and pose estimation
* Visual word disambiguation by semantic contexts
* Weakly supervised object detector learning with model drift detection
* Weakly supervised semantic segmentation with a multi-image model
* What an image reveals about material reflectance
* What characterizes a shadow boundary under the sun and sky?
* Who Blocks Who: Simultaneous clothing segmentation for grouping images
340 for ICCV11

* 3D Scene Understanding by Voxel-CRF
* 3D Sub-query Expansion for Improving Sketch-Based Multi-view Image Retrieval
* 3DNN: Viewpoint Invariant 3D Geometry Matching for Scene Understanding
* Abnormal Event Detection at 150 FPS in MATLAB
* Accurate and Robust 3D Facial Capture Using a Single RGBD Camera
* Accurate Blur Models vs. Image Priors in Single Image Super-resolution
* Action and Event Recognition with Fisher Vectors on a Compact Feature Set
* Action Recognition and Localization by Hierarchical Space-Time Segments
* Action Recognition with Actons
* Action Recognition with Improved Trajectories
* Active Learning of an Action Detector from Untrimmed Videos
* Active MAP Inference in CRFs for Efficient Semantic Segmentation
* Active Visual Recognition with Expertise Estimation in Crowdsourcing
* ACTIVE: Activity Concept Transitions in Video Event Classification
* Adapting Classification Cascades to New Domains
* Adaptive Descriptor Design for Object Recognition in the Wild, An
* Affine-Constrained Group Sparse Coding and Its Application to Image-Based Classifications
* Allocentric Pose Estimation
* Alternating Regression Forests for Object Detection and Pose Estimation
* Analysis of Scores, Datasets, and Models in Visual Saliency Prediction
* Anchored Neighborhood Regression for Fast Example-Based Super-Resolution
* Attribute Adaptation for Personalized Image Search
* Attribute Dominance: What Pops Out?
* Attribute Pivots for Guiding Relevance Feedback in Image Search
* Automatic Kronecker Product Model Based Detection of Repeated Patterns in 2D Urban Images
* Automatic Registration of RGB-D Scans via Salient Directions
* Bayesian 3D Tracking from Monocular Video
* Bayesian Joint Topic Modelling for Weakly Supervised Object Localisation
* Bayesian Robust Matrix Factorization for Image and Video Processing
* Beyond Hard Negative Mining: Efficient Detector Learning via Block-Circulant Decomposition
* Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency
* BOLD Features to Detect Texture-less Objects
* Bounded Labeling Function for Global Segmentation of Multi-part Objects with Geometric Constraints
* Box in the Box: Joint 3D Layout and Object Reasoning from Single Images
* Breaking the Chain: Liberation from the Temporal Markov Assumption for Tracking Human Poses
* Building Part-Based Object Detectors via 3D Geometry
* Calibration-Free Gaze Estimation Using Human Gaze Patterns
* Camera Alignment Using Trajectory Intersections in Unsynchronized Videos
* Capturing Global Semantic Relationships for Facial Action Unit Recognition
* Cascaded Shape Space Pruning for Robust Facial Landmark Detection
* Category-Independent Object-Level Saliency Detection
* Characterizing Layouts of Outdoor Scenes Using Spatial Topic Processes
* Class-Specific Simplex-Latent Dirichlet Allocation for Image Classification
* Co-segmentation by Composition
* Coarse-to-Fine Semantic Video Segmentation Using Supervoxel Trees
* CoDeL: A Human Co-detection and Labeling Framework
* Codemaps: Segment, Classify and Search Objects Locally
* Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations
* Coherent Object Detection with 3D Geometric Context from a Single Image
* Collaborative Active Learning of a Kernel Machine Ensemble for Recognition
* Color Constancy Model with Double-Opponency Mechanisms, A
* Combining the Right Features for Complex Event Recognition
* Compensating for Motion during Direct-Global Separation
* Complementary Projection Hashing
* Complex 3D General Object Reconstruction from Line Drawings
* Compositional Models for Video Event Detection: A Multiple Kernel Learning Latent Variable Approach
* Concurrent Action Detection with Structural Prediction
* Conservation Tracking
* Constant Time Weighted Median Filtering for Stereo Matching and Beyond
* Constructing Adaptive Complex Cells for Robust Visual Tracking
* Content-Aware Rotation
* Contextual Hypergraph Modeling for Salient Object Detection
* Convex Optimization Framework for Active Learning, A
* Corrected-Moment Illuminant Estimation
* Correlation Adaptive Subspace Segmentation by Trace Lasso
* Correntropy Induced L2 Graph for Robust Subspace Clustering
* Cosegmentation and Cosketch by Unsupervised Learning
* Coupled Dictionary and Feature Space Learning with Applications to Cross-Domain Image Synthesis and Recognition
* Coupling Alignments with Recognition for Still-to-Video Face Recognition
* Cross-Field Joint Image Restoration via Scale Map
* Cross-View Action Recognition Over Heterogeneous Feature Spaces
* Curvature-Aware Regularization on Riemannian Submanifolds
* Data-Driven 3D Primitives for Single Image Understanding
* DCSH: Matching Patches in RGBD Images
* Deblurring by Example Using Dense Correspondence
* Decomposing Bag of Words Histograms
* Deep Learning Identity-Preserving Face Space
* Deep Sum-Product Architecture for Robust Facial Attributes Analysis, A
* DeepFlow: Large Displacement Optical Flow with Deep Matching
* Deformable Mixture Parsing Model with Parselets, A
* Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
* Depth from Combining Defocus and Correspondence Using Light-Field Cameras
* Detecting Avocados to Zucchinis: What Have We Done, and Where Are We Going?
* Detecting Curved Symmetric Parts Using a Deformable Disc Model
* Detecting Dynamic Objects with Multi-view Background Subtraction
* Detecting Irregular Curvilinear Structures in Gray Scale and Color Imagery Using Multi-directional Oriented Flux
* Deterministic Fitting of Multiple Structures Using Iterative MaxFS with Inlier Scale Estimation
* Dictionary Learning and Sparse Coding on Grassmann Manifolds: An Extrinsic Solution
* Direct Optimization of Frame-to-Frame Rotation
* Directed Acyclic Graph Kernels for Action Recognition
* Discovering Details and Scene Structure with Hierarchical Iconoid Shift
* Discovering Object Functionality
* Discriminant Tracking Using Tensor Representation with Semi-supervised Improvement
* Discriminative Label Propagation for Multi-object Tracking with Sporadic Appearance Features
* Discriminatively Trained Templates for 3D Object Detection: A Real Time Scalable Approach
* Distributed Low-Rank Subspace Segmentation
* Domain Adaptive Classification
* Domain Transfer Support Vector Ranking for Person Re-identification without Target Camera Label Information
* Drosophila Embryo Stage Annotation Using Label Propagation
* Dynamic Label Propagation for Semi-supervised Multi-class Multi-label Classification
* Dynamic Pooling for Complex Event Recognition
* Dynamic Probabilistic Volumetric Models
* Dynamic Scene Deblurring
* Dynamic Structured Model Selection
* Efficient 3D Scene Labeling Using Fields of Trees
* Efficient and Robust Large-Scale Rotation Averaging
* Efficient Hand Pose Estimation from a Single Depth Image
* Efficient Higher-Order Clustering on the Grassmann Manifold
* Efficient Image Dehazing with Boundary Constraint and Contextual Regularization
* Efficient Pedestrian Detection by Directly Optimizing the Partial Area under the ROC Curve
* Efficient Salient Region Detection with Soft Image Abstraction
* Elastic Fragments for Dense Scene Reconstruction
* Elastic Net Constraints for Shape Matching
* Enhanced Continuous Tabu Search for Parameter Estimation in Multiview Geometry
* Enhanced Structure-from-Motion Paradigm Based on the Absolute Dual Quadric and Images of Circular Points, An
* Ensemble Projection for Semi-supervised Image Classification
* Estimating Human Pose with Flowing Puppets
* Estimating the 3D Layout of Indoor Scenes and Its Clutter from Depth Sensors
* Estimating the Material Properties of Fabric from Video
* Event Detection in Complex Scenes Using Interval Temporal Constraints
* Event Recognition in Photo Collections with a Stopwatch HMM
* EVSAC: Accelerating Hypotheses Generation by Modeling Matching Scores with Extreme Value Theory
* Example-Based Facade Texture Synthesis
* Exemplar Cut
* Exemplar-Based Graph Matching for Robust Facial Landmark Localization
* Exploiting Reflection Change for Automatic Reflection Removal
* Extrinsic Camera Calibration without a Direct View Using Spherical Mirror
* Face Recognition Using Face Patch Networks
* Face Recognition via Archetype Hull Ranking
* Facial Action Unit Event Detection by Cascade of Tasks
* Fast Direct Super-Resolution by Simple Functions
* Fast Face Detector Training Using Tailored Views
* Fast High Dimensional Vector Multiplication Face Recognition
* Fast Neighborhood Graph Search Using Cartesian Concatenation
* Fast Object Segmentation in Unconstrained Video
* Fast Sparsity-Based Orthogonal Dictionary Learning for Image Restoration
* Fast Subspace Search via Grassmannian Based Hashing
* Feature Weighting via Optimal Thresholding for Video Analysis
* Fibonacci Exposure Bracketing for High Dynamic Range Imaging
* Find the Best Path: An Efficient and Accurate Classifier for Image Hierarchies
* Finding Actors and Actions in Movies
* Finding Causal Interactions in Video Sequences
* Finding the Best from the Second Bests: Inhibiting Subjective Bias in Evaluation of Visual Tracking Algorithms
* Fine-Grained Categorization by Alignments
* Fingerspelling Recognition with Semi-Markov Conditional Random Fields
* Flattening Supervoxel Hierarchies by the Uniform Entropy Slice
* Flexible Scene Representation for 3D Reconstruction Using an RGB-D Camera, A
* Fluttering Pattern Generation Using Modified Legendre Sequence for Coded Exposure Imaging
* Forward Motion Deblurring
* Framework for Shape Analysis via Hilbert Space Embedding, A
* From Actemes to Action: A Strongly-Supervised Representation for Detailed Action Understanding
* From Large Scale Image Categorization to Entry-Level Categories
* From Point to Set: Extend the Learning of Distance Metrics
* From Semi-supervised to Transfer Counting of Crowds
* From Subcategories to Visual Composites: A Multi-level Framework for Object Detection
* From Where and How to What We See
* Frustratingly Easy NBNN Domain Adaptation
* Fully Hierarchical Approach for Finding Correspondences in Non-rigid Shapes, A
* General Dense Image Matching Framework Combining Direct and Feature-Based Costs, A
* General Two-Step Approach to Learning-Based Hashing, A
* Generalized Iterated Shrinkage Algorithm for Non-convex Sparse Coding, A
* Generalized Low-Rank Appearance Model for Spatio-temporally Correlated Rain Streaks, A
* Generic Deformation Model for Dense Non-rigid Surface Registration: A Higher-Order MRF-Based Approach, A
* Geometric Registration Based on Distortion Estimation
* Global Fusion of Relative Motions for Robust, Accurate and Scalable Structure from Motion
* Global Linear Method for Camera Pose Registration, A
* Go-ICP: Solving 3D Registration Efficiently and Globally Optimally
* GOSUS: Grassmannian Online Subspace Updates with Structured-Sparsity
* GrabCut in One Cut
* Group Norm for Learning Structured SVMs with Unstructured Latent Variables
* Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps
* Handling Occlusions with Franken-Classifiers
* Handling Uncertain Tags in Visual Recognition
* Handwritten Word Spotting with Corrected Attributes
* Heterogeneous Auto-similarities of Characteristics (HASC): Exploiting Relational Information for Classification
* Heterogeneous Image Features Integration via Multi-modal Semi-supervised Learning Model
* Hidden Factor Analysis for Age Invariant Face Recognition
* Hierarchical Data-Driven Descent for Efficient Optimal Deformation Estimation
* Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition
* Hierarchical Part Matching for Fine-Grained Visual Categorization
* High Quality Shape from a Single RGB-D Image under Uncalibrated Natural Illumination
* Higher Order Matching for Consistent Multiple Target Tracking
* HOGgles: Visualizing Object Detection Features
* Holistic Scene Understanding for 3D Object Detection with RGBD Cameras
* How Do You Tell a Blackbird from a Crow?
* How Related Exemplars Help Complex Event Detection in Web Videos?
* Human Attribute Recognition by Rich Appearance Dictionary
* Human Re-identification by Matching Compositional Template with Cluster Sampling
* Illuminant Chromaticity from Image Sequences
* Image Co-segmentation via Consistent Functional Maps
* Image Guided Depth Upsampling Using Anisotropic Total Generalized Variation
* Image Retrieval Using Textual Cues
* Image Segmentation with Cascaded Hierarchical Models and Logistic Disjunctive Normal Networks
* Image Set Classification Using Holistic Multiple Order Statistics Features and Localized Multi-kernel Metric Learning
* Implied Feedback: Learning Nuances of User Behavior in Image Search
* Improving Graph Matching via Density Maximization
* Incorporating Cloud Distribution in Sky Representation
* Inferring Dark Matter and Dark Energy from Videos
* Initialization-Insensitive Visual Tracking through Voting with Salient Local Features
* Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data
* Interestingness of Images, The
* Internet Based Morphable Model
* Joint Deep Learning for Pedestrian Detection
* Joint Intensity and Depth Co-sparse Analysis Model for Depth Map Super-resolution, A
* Joint Inverted Indexing
* Joint Learning of Discriminative Prototypes and Large Margin Nearest Neighbor Classifiers
* Joint Noise Level Estimation from Personal Photo Collections
* Joint Optimization for Consistent Multiple Graph Matching
* Joint Segmentation and Pose Tracking of Human in Natural Videos
* Joint Subspace Stabilization for Stereoscopic Video
* Large-Scale Image Annotation by Efficient and Robust Kernel Metric Learning
* Large-Scale Multi-resolution Surface Reconstruction from RGB-D Sequences
* Large-Scale Video Hashing via Structure Learning
* Latent Data Association: Bayesian Model Selection for Multi-target Tracking
* Latent Multitask Learning for View-Invariant Action Recognition
* Latent Space Sparse Subspace Clustering
* Latent Task Adaptation with Large-Scale Hierarchies
* Learning a Dictionary of Shape Epitomes with Applications to Image Labeling
* Learning Coupled Feature Spaces for Cross-Modal Matching
* Learning CRFs for Image Parsing with Adaptive Subgradient Descent
* Learning Discriminative Part Detectors for Image Classification and Cosegmentation
* Learning Graph Matching: Oriented to Category Modeling from Cluttered Scenes
* Learning Graphs to Match
* Learning Hash Codes with Listwise Supervision
* Learning Maximum Margin Temporal Warping for Action Recognition
* Learning Near-Optimal Cost-Sensitive Decision Policy for Object Detection
* Learning People Detectors for Tracking in Crowded Scenes
* Learning Slow Features for Behaviour Analysis
* Learning the Visual Interpretation of Sentences
* Learning to Predict Gaze in Egocentric Video
* Learning to Rank Using Privileged Information
* Learning to Share Latent Tasks for Action Recognition
* Learning View-Invariant Sparse Representations for Cross-View Action Recognition
* Learning-Based Approach to Reduce JPEG Artifacts in Image Matting, A
* Lifting 3D Manhattan Lines from a Single Image
* Like Father, Like Son: Facial Expression Dynamics for Kinship Verification
* Line Assisted Light Field Triangulation and Stereo Matching
* Linear Sequence Discriminant Analysis: A Model-Based Dimensionality Reduction Method for Vector Sequences
* Live Metric 3D Reconstruction on Mobile Phones
* Local Signal Equalization for Correspondence Matching
* Locally Affine Sparse-to-Dense Matching for Motion and Occlusion Estimation
* Log-Euclidean Kernels for Sparse Representation and Dictionary Learning
* Low-Rank Sparse Coding for Image Classification
* Manifold Based Face Synthesis from Sparse Samples
* Manipulation Pattern Discovery: A Nonparametric Bayesian Approach
* Markov Network-Based Unified Classifier for Face Identification
* Matching Dry to Wet Materials
* Max-Margin Perspective on Sparse Representation-Based Classification, A
* Measuring Flow Complexity in Videos
* Method of Perceptual-Based Shape Decomposition, A
* Minimal Basis Facility Location for Subspace Segmentation
* Mining Motion Atoms and Phrases for Complex Action Recognition
* Mining Multiple Queries for Image Retrieval: On-the-Fly Learning of an Object-Specific Mid-level Representation
* Model Recommendation with Virtual Probes for Egocentric Hand Detection
* Modeling 4D Human-Object Interactions for Event and Object Recognition
* Modeling Occlusion by Discriminative AND-OR Structures
* Modeling Self-Occlusions in Dynamic Shape and Appearance Tracking
* Modeling the Calibration Pipeline of the Lytro Camera for High Quality Light-Field Image Reconstruction
* Modifying the Memorability of Face Photographs
* Monocular Image 3D Human Pose Estimation under Self-Occlusion
* Monte Carlo Tree Search for Scheduling Activity Recognition
* Motion-Aware KNN Laplacian for Video Matting
* Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection, The
* Multi-attributed Dictionary Learning for Sparse Coding
* Multi-channel Correlation Filters
* Multi-scale Topological Features for Hand Posture Representation and Analysis
* Multi-stage Contextual Deep Learning for Pedestrian Detection
* Multi-view 3D Reconstruction from Uncalibrated Radially-Symmetric Cameras
* Multi-view Normal Field Integration for 3D Reconstruction of Mirroring Objects
* Multi-view Object Segmentation in Space and Time
* Multiple Non-rigid Surface Detection and Registration
* Multiview Photometric Stereo Using Planar Mesh Parameterization
* Neighbor-to-Neighbor Search for Fast Coding of Feature Vectors
* NEIL: Extracting Visual Knowledge from Web Data
* Nested Shape Descriptors
* Network Principles for SfM: Disambiguating Repeated Structures with Local Context
* New Adaptive Segmental Matching Measure for Human Activity Recognition, A
* New Graph Structured Sparsity Model for Multi-label Image Annotations
* New Image Quality Metric for Image Auto-denoising, A
* No Matter Where You Are: Flexible Graph-Guided Multi-task Learning for Multi-view Head Pose Classification under Target Motion
* Non-convex P-Norm Projection for Robust Sparsity
* Non-parametric Bayesian Network Prior of Human Pose, A
* Nonparametric Blind Super-resolution
* Novel Earth Mover's Distance Methodology for Image Matching with Gaussian Mixture Models, A
* NYC3DCars: A Dataset of 3D Vehicles in Geographic Context
* Offline Mobile Instance Retrieval with a Small Memory Footprint
* On One-Shot Similarity Kernels: Explicit Feature Maps and Properties
* On the Mean Curvature Flow on Graphs with Applications in Image and Manifold Processing
* Online Motion Segmentation Using Dynamic Label Propagation
* Online Robust Non-negative Dictionary Learning for Visual Tracking
* Online Video SEEDS for Temporal Window Objectness
* Optical Flow via Locally Adaptive Fusion of Complementary Data Costs
* Optimal Orthogonal Basis and Image Assimilation: Motion Modeling
* Optimization Problems for Fast AAM Fitting in-the-Wild
* Orderless Tracking through Model-Averaged Posterior Estimation
* Paper Doll Parsing: Retrieving Similar Styles to Parse Clothing Items
* Parallel Transport of Deformations in Shape Space of Elastic Surfaces
* Parsing IKEA Objects: Fine Pose Estimation
* Partial Enumeration and Curvature Regularization
* Partial Sum Minimization of Singular Values in RPCA for Low-Level Vision
* Pedestrian Parsing via Deep Decompositional Network
* Perceptual Fidelity Aware Mean Squared Error
* Person Re-identification by Salience Matching
* Perspective Motion Segmentation via Collaborative Clustering
* PhotoOCR: Reading Text in Uncontrolled Conditions
* Pictorial Human Spaces: How Well Do Humans Perceive a 3D Articulated Pose?
* Piecewise Rigid Scene Flow
* PixelTrack: A Fast Adaptive Algorithm for Tracking Non-rigid Objects
* PM-Huber: PatchMatch with Huber Regularization for Stereo Matching
* Point-Based 3D Reconstruction of Thin Objects
* POP: Person Re-identification Post-rank Optimisation
* Pose Estimation and Segmentation of People in 3D Movies
* Pose Estimation with Unknown Focal Length Using Points, Directions and Lines
* Pose-Configurable Generic Tracking of Elongated Objects
* Pose-Free Facial Landmark Fitting via Optimized Part Mixtures and Cascaded Deformable Shape Model
* Potts Model, Parametric Maxflow and K-Submodular Functions
* Practical Transfer Learning Algorithm for Face Verification, A
* Predicting an Object Location Using a Global Image Representation
* Predicting Primary Gaze Behavior Using Social Saliency Fields
* Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation
* Prime Object Proposals with Randomized Prim's Algorithm
* Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation
* Progressive Multigrid Eigensolvers for Multiscale Spectral Segmentation
* Proportion Priors for Image Sequence Segmentation
* Pyramid Coding for Functional Scene Element Recognition in Video Scenes
* Quadruplet-Wise Image Similarity Learning
* Quantize and Conquer: A Dimensionality-Recursive Solution to Clustering, Vector Quantization, and Image Retrieval
* Query-Adaptive Asymmetrical Dissimilarities for Visual Object Retrieval
* Random Faces Guided Sparse Many-to-One Encoder for Pose-Invariant Face Recognition
* Random Forests of Local Experts for Pedestrian Detection
* Random Grids: Fast Approximate Nearest Neighbors and Range Searching for Image Search
* Randomized Ensemble Tracking
* Rank Minimization across Appearance and Shape for AAM Ensemble Fitting
* Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests
* Real-Time Body Tracking with One Depth Camera and Inertial Sensors
* Real-Time Solution to the Absolute Pose Problem with Unknown Radial Distortion and Focal Length
* Real-World Normal Map Capture for Nearly Flat Reflective Surfaces
* Recognising Human-Object Interaction via Exemplar Based Modelling
* Recognizing Text with Perspective Distortion in Natural Scenes
* Rectangling Stereographic Projection for Wide-Angle Image Visualization
* Recursive Estimation of the Stein Center of SPD Matrices and Its Applications
* Refractive Structure-from-Motion on Underwater Images
* Regionlets for Generic Object Detection
* Relative Attributes for Large-Scale Abandoned Object Detection
* Restoring an Image Taken through a Window Covered with Dirt or Rain
* Revisiting Example Dependent Cost-Sensitive Learning with Decision Trees
* Revisiting the PnP Problem: A Fast, General and Optimal Solution
* Robust Analytical Solution to Isometric Shape-from-Template with Focal Length Calibration, A
* Robust Dictionary Learning by Error Source Decomposition
* Robust Face Landmark Estimation under Occlusion
* Robust Feature Set Matching for Partial Face Recognition
* Robust Matrix Factorization with Unknown Noise
* Robust Non-parametric Data Fitting for Correspondence Modeling
* Robust Object Tracking with Online Multi-Lifespan Dictionary Learning
* Robust Subspace Clustering via Half-Quadratic Minimization
* Robust Trajectory Clustering for Motion Segmentation
* Robust Tucker Tensor Decomposition for Effective Image Representation
* Rolling Shutter Stereo
* Rotational Stereo Model Based on XSlit Imaging, A
* Saliency and Human Fixations: State-of-the-Art and Study of Comparison Metrics
* Saliency Detection in Large Point Sets
* Saliency Detection via Absorbing Markov Chain
* Saliency Detection via Dense and Sparse Reconstruction
* Saliency Detection: A Boolean Map Approach
* Salient Region Detection by UFO: Uniqueness, Focusness and Objectness
* Scalable Unsupervised Feature Merging Approach to Efficient Dimensionality Reduction of High-Dimensional Visual Data, A
* Scene Collaging: Analysis and Synthesis of Natural Images with Semantic Layers
* Scene Text Localization and Recognition with Oriented Stroke Detection
* Segmentation Driven Object Detection with Fisher Vectors
* Semantic Segmentation without Annotating Segments
* Semantic Transform: Weakly Supervised Semantic Inference for Relating Visual Attributes
* Semantic-Aware Co-Indexing for Image Retrieval
* Semantically-Based Human Scanpath Estimation with HMMs
* Semi-dense Visual Odometry for a Monocular Camera
* Semi-supervised Learning for Large Scale Image Cosegmentation
* Semi-supervised Robust Dictionary Learning via Efficient l-Norms Minimization
* Separating Reflective and Fluorescent Components Using High Frequency Illumination in the Spectral Domain
* Sequential Bayesian Model Update under Structured Scene Prior for Semantic Road Scenes Labeling
* SGTD: Structure Gradient and Texture Decorrelating Regularization for Image Decomposition
* Shape Anchors for Data-Driven Multi-view Reconstruction
* Shape Index Descriptors Applied to Texture-Based Galaxy Analysis
* Shortest Paths with Curvature and Torsion
* Shufflets: Shared Mid-level Parts for Fast Object Detection
* Sieving Regression Forest Votes for Facial Feature Detection in the Wild
* SIFTpack: A Compact Representation for Efficient SIFT Matching
* Similarity Metric Learning for Face Recognition
* Simple Model for Intrinsic Image Decomposition with Depth Cues, A
* Simultaneous Clustering and Tracklet Linking for Multi-face Tracking in Videos
* Single-Patch Low-Rank Prior for Non-pointwise Impulse Noise Removal
* Slice Sampling Particle Belief Propagation
* Space-Time Robust Representation for Action Recognition
* Space-Time Tradeoffs in Photo Sequencing
* Sparse Variation Dictionary Learning for Face Recognition with a Single Training Sample per Person
* Sparsifying Neural Network Connections for Face Recognition
* Spoken Attributes: Mixing Binary and Relative Attributes to Say the Right Thing
* Stable Hyper-pooling and Query Expansion for Event Detection
* Stacked Predictive Sparse Coding for Classification of Distinct Regions in Tumor Histopathology
* STAR3D: Simultaneous Tracking and Reconstruction of 3D Objects Using RGB-D Data
* Street View Motion-from-Structure-from-Motion
* Strong Appearance and Expressive Spatial Models for Human Pose Estimation
* Structured Forests for Fast Edge Detection
* Structured Learning of Sum-of-Submodular Higher Order Energy Functions
* Structured Light in Sunlight
* Style-Aware Mid-level Representation for Discovering Visual Connections in Space and Time
* Subpixel Scanning Invariant to Indirect Lighting Using Quadratic Code Length
* SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels
* Super-resolution via Transform-Invariant Group-Sparse Regularization
* Supervised Binary Hash Code Learning with Jensen Shannon Divergence
* Support Surface Prediction in Indoor Scenes
* SYM-FISH: A Symmetry-Aware Flip Invariant Sketch Histogram Shape Descriptor
* Symbiotic Segmentation and Part Localization for Fine-Grained Categorization
* Synergistic Clustering of Image and Segment Descriptors for Unsupervised Scene Understanding
* Target-Driven Moire Pattern Synthesis by Phase Modulation
* Temporally Consistent Superpixels
* Text Localization in Natural Images Using Stroke Feature Transform and Text Covariance Descriptors
* To Aggregate or Not to aggregate: Selective Match Kernels for Image Search
* Topology-Constrained Layered Tracking with Latent Flow
* Total Variation Regularization for Functions with Values in a Manifold
* Toward Guaranteed Illumination Models for Non-convex Objects
* Towards Motion Aware Light Field Video for Dynamic Scenes
* Towards Understanding Action Recognition
* Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines
* Tracking via Robust Multi-task Multi-View Joint Sparse Representation
* Training Deformable Part Models with Decorrelated Features
* Transfer Feature Learning with Joint Distribution Adaptation
* Translating Video Content to Natural Language Descriptions
* Tree Shape Priors with Connectivity Constraints Using Convex Relaxation on General Graphs
* Two-Point Gait: Decoupling Gait from Body Shape
* Unbiased Metric Learning: On the Utilization of Multiple Datasets and Web Images for Softening Bias
* Uncertainty-Driven Efficiently-Sampled Sparse Graphical Models for Concurrent Tumor Segmentation and Atlas Registration
* Understanding High-Level Semantics by Modeling Traffic Patterns
* Unified Probabilistic Approach Modeling Relationships between Attributes and Objects, A
* Unified Rolling Shutter and Motion Blur Model for 3D Visual Registration, A
* Unified Video Segmentation Benchmark: Annotation, Metrics and Analysis, A
* Unifying Nuclear Norm and Bilinear Factorization Approaches for Low-Rank Matrix Decomposition
* Unsupervised Domain Adaptation by Domain Invariant Projection
* Unsupervised Intrinsic Calibration from a Single Frame Using a Plumb-Line Approach
* Unsupervised Random Forest Manifold Alignment for Lipreading
* Unsupervised Visual Domain Adaptation Using Subspace Alignment
* Video Co-segmentation for Meaningful Action Extraction
* Video Event Understanding Using Natural Language Descriptions
* Video Motion for Every Visible Point
* Video Segmentation by Tracking Many Figure-Ground Segments
* Video Synopsis by Heterogeneous Multi-source Correlation
* Viewing Real-World Faces in 3D
* Visual Reranking through Weakly Supervised Multi-graph Learning
* Visual Semantic Complex Network for Web Images
* Volumetric Semantic Segmentation Using Pyramid Context Features
* Way They Move: Tracking Multiple Targets with Similar Appearance, The
* Weakly Supervised Learning of Image Partitioning Using Decision Trees with Structured Split Criteria
* What Do You Do? Occupation Recognition in a Photo via Social Context
* What is the Most EfficientWay to Select Nearest Neighbor Candidates for Fast Approximate Nearest Neighbor Search?
* Write a Classifier: Zero-Shot Learning Using Purely Textual Descriptions
* YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-Shot Recognition
* 3D Fragment Reassembly Using Integrated Template Guidance and Fracture-Region Matching
* 3D Hand Pose Estimation Using Randomized Decision Forest with Segmentation Index Points
* 3D Object Reconstruction from Hand-Object Interactions
* 3D Surface Profilometry Using Phase Shifting of De Bruijn Pattern
* 3D Time-Lapse Reconstruction from Internet Photos
* 3D-Assisted Feature Synthesis for Novel Views of an Object
* Accurate Camera Calibration Robust to Defocus Using a Smartphone
* Accurate Iris Segmentation Framework Under Relaxed Imaging Constraints Using Total Variation Model, An
* Action Detection by Implicit Intentional Motion Clustering
* Action Localization in Videos through Context Walk
* Action Recognition by Hierarchical Mid-Level Action Elements
* Actionness-Assisted Recognition of Actions
* Actions and Attributes from Wholes and Parts
* Active Object Localization with Deep Reinforcement Learning
* Active One-Shot Scan for Wide Depth Range Using a Light Field Projector Based on Coded Aperture
* Active Transfer Learning with Zero-Shot Priors: Reusing Past Datasets for Future Tasks
* Activity Auto-Completion: Predicting Human Activities from Partial Videos
* Adaptive Data Representation for Robust Point-Set Registration and Merging, An
* Adaptive Dither Voting for Robust Spatial Verification
* Adaptive Exponential Smoothing for Online Filtering of Pixel Prediction Maps
* Adaptive Hashing for Fast Similarity Search
* Adaptive Spatial-Spectral Dictionary Learning for Hyperspectral Image Denoising
* Adaptively Unified Semi-Supervised Dictionary Learning with Active Points
* Additive Nearest Neighbor Feature Maps
* Aggregating Local Deep Features for Image Retrieval
* Airborne Three-Dimensional Cloud Tomography
* Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books
* Alternating Co-Quantization for Cross-Modal Hashing
* Amodal Completion and Size Constancy in Natural Scenes
* As-Rigid-as-Possible Volumetric Shape-from-Template
* Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images
* AttentionNet: Aggregating Weak Directions for Accurate Object Detection
* Attribute-Graph: A Graph Based Approach to Image Ranking
* Attributed Grammars for Joint Estimation of Human Attributes, Part and Pose
* Augmenting Strong Supervision Using Web Data for Fine-Grained Categorization
* Automated Facial Trait Judgment and Election Outcome Prediction: Social Dimensions of Face
* Automatic Concept Discovery from Parallel Text and Visual Corpora
* Automatic Thumbnail Generation Based on Visual Representativeness and Foreground Recognizability
* Bayesian Model Adaptation for Crowd Counts
* Bayesian Non-parametric Inference for Manifold Based MoCap Representation
* Beyond Covariance: Feature Representation with Nonlinear Kernel Matrices
* Beyond Gauss: Image-Set Matching on the Riemannian Manifold of PDFs
* Beyond Tree Structure Models: A New Occlusion Aware Graphical Model for Human Pose Estimation
* Beyond White: Ground Truth Colors for Color Constancy Correction
* Bi-Shifting Auto-Encoder for Unsupervised Domain Adaptation
* Bilinear CNN Models for Fine-Grained Visual Recognition
* Blur-Aware Disparity Estimation from Defocus Stereo Images
* BodyPrint: Pose Invariant 3D Shape Matching of Human Bodies
* Boosting Object Proposals: From Pascal to COCO
* Box Aggregation for Proposal Decimation: Last Mile of Object Detection
* BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation
* BubbLeNet: Foveated Imaging for Visual Discovery
* Building Dynamic Cloud Maps from the Ground Up
* Camera Pose Voting for Large-Scale Image-Based Localization
* Car that Knows Before You Do: Anticipating Maneuvers via Learning Temporal Driving Models
* Cascaded Sparse Spatial Bins for Efficient and Effective Generic Object Detection
* Category-Blind Human Action Recognition: A Practical Recognition System
* Class-Specific Image Deblurring
* Classical Scaling Revisited
* Cluster-Based Point Set Saliency
* Co-Interest Person Detection from Multiple Wearable Camera Videos
* Collaborative Filtering Approach to Real-Time Hand Pose Estimation, A
* Common Subspace for Model and Similarity: Phrase Learning for Caption Generation from Images
* Complementary Sets of Shutter Sequences for Motion Deblurring
* Component-Wise Modeling of Articulated Objects
* Compositional Hierarchical Representation of Shape Manifolds for Classification of Non-manifold Shapes
* Comprehensive Multi-Illuminant Dataset for Benchmarking of the Intrinsic Image Algorithms, A
* Compression Artifacts Reduction by a Deep Convolutional Network
* Conditional Convolutional Neural Network for Modality-Aware Face Recognition
* Conditional High-Order Boltzmann Machine: A Supervised Learning Model for Relation Learning
* Conditional Random Fields as Recurrent Neural Networks
* Conditioned Regression Models for Non-blind Single Image Super-Resolution
* Confidence Preserving Machine for Facial Action Unit Detection
* Conformal and Low-Rank Sparse Representation for Image Restoration
* Constrained Convolutional Neural Networks for Weakly Supervised Segmentation
* Context Aware Active Learning of Activity Recognition Models
* Context-Aware CNNs for Person Head Detection
* Context-Guided Diffusion for Label Propagation on Graphs
* Contextual Action Recognition with R*CNN
* Continuous Pose Estimation with a Spatial Ensemble of Fisher Regressors
* Contour Box: Rejecting Object Proposals without Explicit Closed Contours
* Contour Detection and Characterization for Asynchronous Event Sensors
* Contour Flow: Middle-Level Motion Estimation by Combining Motion Segmentation and Contour Alignment
* Contour Guided Hierarchical Model for Shape Matching
* Contractive Rectifier Networks for Nonlinear Maximum Margin Classification
* Convex Optimization with Abstract Linear Operators
* Convolutional Channel Features
* Convolutional Color Constancy
* Convolutional Sparse Coding for Image Super-Resolution
* COUNT Forest: CO-Voting Uncertain Number of Targets Using Random Forest for Crowd Density Estimation
* Cross-Domain Image Retrieval with a Dual Attribute-Aware Ranking Network
* Cutting Edge: Soft Correspondences in Multimodal Scene Parsing
* CV-HAZOP: Introducing Test Data Validation for Computer Vision
* Data-Driven Metric for Comprehensive Evaluation of Saliency Models, A
* Deep Colorization
* Deep Fried Convnets
* Deep Learning Face Attributes in the Wild
* Deep Learning Strong Parts for Pedestrian Detection
* Deep Multi-patch Aggregation Network for Image Style, Aesthetics, and Quality Estimation
* Deep Networks for Image Super-Resolution with Sparse Prior
* Deep Neural Decision Forests
* Deep Visual Correspondence Embedding Model for Stereo Matching Costs, A
* DeepBox: Learning Objectness with Convolutional Networks
* DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving
* DeepProposal: Hunting Objects by Cascading Deep Convolutional Layers
* Deformable 3D Fusion: From Partial Dynamic 3D Observations to Complete 4D Models
* Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
* Dense Continuous-Time Tracking and Mapping with Rolling Shutter RGB-D Cameras
* Dense Image Registration and Deformable Surface Reconstruction in Presence of Occlusions and Minimal Texture
* Dense Optical Flow Prediction from a Static Image
* Dense Semantic Correspondence Where Every Pixel is a Classifier
* Depth Map Estimation and Colorization of Anaglyph Images Using Local Color Prior and Reverse Intensity Distribution
* Depth Recovery from Light Field Using Focal Stack Symmetry
* Depth Selective Camera: A Direct, On-Chip, Programmable Technique for Depth Selectivity in Photography
* Depth-Based Hand Pose Estimation: Data, Methods, and Challenges
* Describing Videos by Exploiting Temporal Structure
* Detailed Full-Body Reconstructions of Moving People from Monocular RGB-D Sequences
* Detection and Segmentation of 2D Curved Reflection Symmetric Structures
* Differential Recurrent Neural Networks for Action Recognition
* Direct Intrinsics: Learning Albedo-Shading Decomposition by Convolutional Regression
* Direct, Dense, and Deformable: Template-Based Non-rigid 3D Reconstruction from RGB Video
* Discovering the Spatial Extent of Relative Attributes
* Discrete Tabu Search for Graph Matching
* Discriminative Learning of Deep Convolutional Feature Point Descriptors
* Discriminative Low-Rank Tracking
* Discriminative Pose-Free Descriptors for Face and Object Matching
* Domain Generalization for Object Recognition with Multi-task Autoencoders
* Dual-Feature Warping-Based Motion Model Estimation
* Dynamic Texture Recognition via Orthogonal Tensor Dictionary Learning
* Efficient Classifier Training to Minimize False Merges in Electron Microscopy Segmentation
* Efficient Decomposition of Image and Mesh Graphs by Lifted Multicuts
* Efficient Minimal Solution for Multi-camera Motion, An
* Efficient PSD Constrained Asymmetric Metric Learning for Person Re-Identification
* Efficient Solution to the Epipolar Geometry for Radially Distorted Cameras
* Efficient Statistical Method for Image Noise Level Estimation, An
* Efficient Video Segmentation Using Parametric Graph Partitioning
* Enhancing Road Maps by Parsing Aerial Images Around the World
* Entropy Minimization for Convex Relaxation Approaches
* Entropy-Based Latent Structured Output Prediction
* Example-Based Modeling of Facial Texture from Deficient Data
* Exploiting High Level Scene Cues in Stereo Reconstruction
* Exploiting Object Similarity in 3D Reconstruction
* Exploration of Parameter Redundancy in Deep Networks with Circulant Projections, An
* Exploring Causal Relationships in Visual Object Tracking
* Extended Depth of Field Catadioptric Imaging Using Focal Sweep
* External Patch Prior Guided Internal Clustering for Image Denoising
* Extraction of Virtual Baselines from Distorted Document Images Using Curvilinear Projection
* Face Flow
* FaceDirector: Continuous Control of Facial Performance in Video
* Fast and Accurate Head Pose Estimation via Random Projection Forests
* Fast and Effective L_0 Gradient Minimization by Region Fusion
* Fast Orthogonal Projection Based on Kronecker Product
* Fast R-CNN
* FASText: Efficient Unconstrained Scene Text Detector
* Fill and Transfer: A Simple Physics-Based Approach for Containability Reasoning
* Fine-Grained Change Detection of Misaligned Scenes with Varied Illuminations
* Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
* Flow Fields: Dense Correspondence Fields for Highly Accurate Large Displacement Optical Flow Estimation
* Flowing ConvNets for Human Pose Estimation in Videos
* FlowNet: Learning Optical Flow with Convolutional Networks
* FollowMe: Efficient Online Min-Cost Flow Tracking with Bounded Memory and Computation
* Frequency-Based Environment Matting by Compressive Sensing
* From Emotions to Action Units with Hidden and Semi-Hidden-Task Learning
* From Facial Parts Responses to Face Detection: A Deep Learning Approach
* Fully Connected Guided Image Filtering
* Fully Connected Object Proposals for Video Segmentation
* Gaussian Process Latent Variable Model for BRDF Inference, A
* General Dynamic Scene Reconstruction from Multiple View Video
* Generating Notifications for Missing Actions: Don't Forget to Turn the Lights Off!
* Generic Promotion of Diffusion-Based Salient Object Detection
* Geometry-Aware Deep Transform
* Global Structure-from-Motion by Similarity Averaging
* Global, Dense Multiscale Reconstruction for a Billion Points
* Globally Optimal 2D-3D Registration from Points or Lines without Correspondences
* Group Membership Prediction
* Groupwise Multilinear Correspondence Optimization for 3D Faces, A
* Guaranteed Outlier Removal for Rotation Search
* Guiding the Long-Short Term Memory Model for Image Caption Generation
* HARF: Hierarchy-Associated Rich Features for Salient Object Detection
* Harvesting Discriminative Meta Objects with Deep CNN Features for Scene Classification
* HCI Stereo Metrics: Geometry-Aware Performance Analysis of Stereo Algorithms, The
* HD-CNN: Hierarchical Deep Convolutional Neural Networks for Large Scale Visual Recognition
* HICO: A Benchmark for Recognizing Human-Object Interactions in Images
* Hierarchical Convolutional Features for Visual Tracking
* Hierarchical Higher-Order Regression Forest Fields: An Application to 3D Indoor Scene Labelling
* High Quality Structure from Small Motion for Rolling Shutter Cameras
* High-for-Low and Low-for-High: Efficient Boundary Detection from Deep Object Features and Its Applications to High-Level Vision
* Higher-Order CRF Structural Segmentation of 3D Reconstructed Surfaces
* Higher-Order Inference for Multi-class Log-Supermodular Models
* Highly-Expressive Spaces of Well-Behaved Transformations: Keeping it Simple
* Holistically-Nested Edge Detection
* Hot or Not: Exploring Correlations between Appearance and Temperature
* Human Action Recognition Using Factorized Spatio-Temporal Convolutional Networks
* Human Parsing with Contextualized Convolutional Neural Network
* Human Pose Estimation in Videos
* Hyperpoints and Fine Vocabularies for Large-Scale Location Recognition
* Hyperspectral Compressive Sensing Using Manifold-Structured Sparsity Prior
* Hyperspectral Super-Resolution by Coupled Spectral Unmixing
* Illumination Robust Color Naming via Label Propagation
* Im2Calories: Towards an Automated Mobile Vision Food Diary
* Image Matting with KL-Divergence Based Sparse Sampling
* Improving Ferns Ensembles by Sparsifying and Quantising Posterior Probabilities
* Improving Image Classification with Location Context
* Improving Image Restoration with Soft-Rounding
* Inferring M-Best Diverse Labelings in a Single One
* Infinite Feature Selection
* Integrating Dashcam Views through Inter-Video Mapping
* Interactive Visual Hull Refinement for Specular and Transparent Object Surface Reconstruction
* Interpolation on the Manifold of K Component GMMs
* Intrinsic Decomposition of Image Sequences from Local Temporal Variations
* Intrinsic Depth: Improving Depth Transfer with Intrinsic Images
* Intrinsic Scene Decomposition from RGB-D Images
* Introducing Geometry in Active Learning for Image Segmentation
* Joint Camera Clustering and Surface Segmentation for Large-Scale Multi-view Stereo
* Joint Fine-Tuning in Deep Neural Networks for Facial Expression Recognition
* Joint Image Handbook, The
* Joint Object and Part Segmentation Using Deep Learned Potentials
* Joint Optimization of Segmentation and Color Clustering
* Joint Probabilistic Data Association Revisited
* Just Noticeable Differences in Visual Attributes
* kNN Hashing with Factorized Neighborhood Representation
* Large Displacement 3D Scene Flow with Occlusion Reasoning
* Learning a Descriptor-Specific 3D Keypoint Detector
* Learning a Discriminative Model for the Perception of Realism in Composite Images
* Learning Analysis-by-Synthesis for 6D Pose Estimation in RGB-D Images
* Learning Binary Codes for Maximum Inner Product Search
* Learning Common Sense through Visual Abstraction
* Learning Complexity-Aware Cascades for Deep Pedestrian Detection
* Learning Concept Embeddings with Combined Human-Machine Expertise
* Learning Data-Driven Reflectance Priors for Intrinsic Image Decomposition
* Learning Deconvolution Network for Semantic Segmentation
* Learning Deep Object Detectors from 3D Models
* Learning Deep Representation with Large-Scale Attributes
* Learning Discriminative Reconstructions for Unsupervised Outlier Removal
* Learning Ensembles of Potential Functions for Structured Prediction with Latent Variables
* Learning Image and User Features for Recommendation in Social Networks
* Learning Image Representations Tied to Ego-Motion
* Learning Informative Edge Maps for Indoor Scene Layout Prediction
* Learning Large-Scale Automatic Image Colorization
* Learning Like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images
* Learning Nonlinear Spectral Filters for Color Image Reconstruction
* Learning Ordinal Relationships for Mid-Level Vision
* Learning Parametric Distributions for Image Super-Resolution: Where Patch Matching Meets Sparse Coding
* Learning Query and Image Similarities with Ranking Canonical Correlation Analysis
* Learning Semi-Supervised Representation Towards a Unified Optimization Framework for Semi-Supervised Learning
* Learning Shape, Motion and Elastic Models in Force Space
* Learning Social Relation Traits from Face Images
* Learning Spatially Regularized Correlation Filters for Visual Tracking
* Learning Spatiotemporal Features with 3D Convolutional Networks
* Learning Temporal Embeddings for Complex Video Analysis
* Learning the Structure of Deep Convolutional Networks
* Learning to Boost Filamentary Structure Segmentation
* Learning to Combine Mid-Level Cues for Object Proposal Generation
* Learning to Divide and Conquer for Online Multi-target Tracking
* Learning to Predict Saliency on Face Images
* Learning to Rank Based on Subsequences
* Learning to See by Moving
* Learning to Track for Spatio-Temporal Action Localization
* Learning to Track: Online Multi-object Tracking by Decision Making
* Learning to Transfer: Transferring Latent Task Structures and Its Application to Person-Specific Facial Action Unit Detection
* Learning Visual Clothing Style with Heterogeneous Dyadic Co-Occurrences
* Learning Where to Position Parts in 3D
* Leave-One-Out Kernel Optimization for Shadow Detection
* Lending A Hand: Detecting Hands and Recognizing Activities in Complex Egocentric Interactions
* Leveraging Datasets with Varying Annotations for Face Alignment via Deep Regression Network
* LEWIS: Latent Embeddings for Word Images and Their Semantics
* Likelihood-Ratio Test and Efficient Robust Estimation, The
* Linear Generalized Camera Calibration from Three Intersecting Reference Planes, A
* Linearization to Nonlinear Learning for Visual Tracking
* Listening with Your Eyes: Towards a Practical Visual Speech Recognition System Using Deep Boltzmann Machines
* Live Repetition Counting
* Local Convolutional Features with Unsupervised Training for Image Retrieval
* Local Subspace Collaborative Tracking
* Localize Me Anywhere, Anytime: A Multi-task Point-Retrieval Approach
* Look and Think Twice: Capturing Top-Down Visual Attention with Feedback Convolutional Neural Networks
* Lost Shopping! Monocular Localization in Large Indoor Spaces
* Love Thy Neighbors: Image Annotation by Exploiting Image Metadata
* Low Dimensional Explicit Feature Maps
* Low-Rank Matrix Factorization under General Mixture Noise Distributions
* Low-Rank Tensor Approximation with Laplacian Scale Mixture Modeling for Multiframe Image Denoising
* Low-Rank Tensor Constrained Multiview Subspace Clustering
* MANTRA: Minimum Maximum Latent Structural SVM for Image Classification and Ranking
* MAP Disparity Estimation Using Hidden Markov Trees
* Massively Parallel Multiview Stereopsis by Surface Normal Diffusion
* Matrix Backpropagation for Deep Networks with Structured Layers
* Matrix Decomposition Perspective to Multiple Graph Matching, A
* Maximum-Margin Structured Learning with Deep Networks for 3D Human Pose Estimation
* Merging the Unmatchable: Stitching Visually Disconnected SfM Models
* MeshStereo: A Global Stereo Model with Mesh Alignment Regularization for View Interpolation
* Middle Child Problem: Revisiting Parametric Min-Cut and Seeds for Object Proposals, The
* Minimal Solvers for 3D Geometry from Satellite Imagery
* Minimizing Human Effort in Interactive Tracking by Incremental Learning of Model Parameters
* Minimum Barrier Salient Object Detection at 80 FPS
* Mining And-Or Graphs for Graph Matching and Object Discovery
* ML-MG: Multi-label Learning with Missing Labels Using a Mixed Graph
* MMSS: Multi-modal Sharable and Specific Feature Learning for RGB-D Object Recognition
* Mode-Seeking on Hypergraphs for Robust Geometric Model Fitting
* Model-Based Tracking at 300Hz Using Raw Time-of-Flight Observations
* Monocular Object Instance Segmentation and Depth Ordering with CNNs
* Motion Trajectory Segmentation via Minimum Cost Multicuts
* MRF-Poselets Model for Detecting Highly Articulated Humans, An
* Multi-class Multi-annotator Active Learning with Robust Gaussian Process for Visual Recognition
* Multi-conditional Latent Variable Model for Joint Facial Action Unit Detection
* Multi-cue Structure Preserving MRF for Unconstrained Video Segmentation
* Multi-image Matching via Fast Alternating Minimization
* Multi-kernel Correlation Filter for Visual Tracking
* Multi-label Cross-Modal Retrieval
* Multi-Scale Learning for Low-Resolution Person Re-Identification
* Multi-scale Recognition with DAG-CNNs
* Multi-Task Learning with Low Rank Attribute Embedding for Person Re-Identification
* Multi-task Recurrent Neural Network for Immediacy Prediction
* Multi-View Complementary Hash Tables for Nearest Neighbor Search
* Multi-view Convolutional Neural Networks for 3D Shape Recognition
* Multi-view Domain Generalization for Visual Recognition
* Multi-view Subspace Clustering
* Multimodal Convolutional Neural Networks for Matching Image and Sentence
* Multiple Feature Fusion via Weighted Entropy for Visual Tracking
* Multiple Granularity Descriptors for Fine-Grained Categorization
* Multiple Hypothesis Tracking Revisited
* Multiple-Hypothesis Affine Region Estimation with Anisotropic LoG Filters
* Multiresolution Hierarchy Co-Clustering for Semantic Segmentation in Sequences with Small Variations
* Multiscale Variable-Grouping Framework for MRF Energy Minimization, A
* Mutual-Structure for Joint Filtering
* Naive Bayes Super-Resolution Forest
* Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor
* Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks
* Nighttime Haze Removal with Glow and Multiple Light Colors
* NMF Perspective on Binary Hashing, An
* Non-parametric Structure-Based Calibration of Radially Symmetric Cameras
* Nonparametric Bayesian Approach toward Stacked Convolutional Independent Component Analysis, A
* Novel Representation of Parts for Accurate 3D Object Detection and Tracking in Monocular Images, A
* Novel Sparsity Measure for Tensor Recovery, A
* Object Detection Using Generalization and Efficiency Balanced Co-Occurrence Features
* Object Detection via a Multi-region and Semantic Segmentation-Aware CNN Model
* Objects2action: Classifying and Localizing Actions without Any Video Example
* Occlusion-Aware Depth Estimation Using Light-Field Cameras
* On Linear Structure from Motion for Light Field Cameras
* On Statistical Analysis of Neuroimages with Imperfect Registration
* On the Equivalence of Moving Entrance Pupil and Radial Distortion for Camera Calibration
* On the Visibility of Point Clouds
* One Shot Learning via Compositions of Meaningful Patches
* One Triangle Three Parallelograms Sampling Strategy and Its Application in Shape Regression, The
* Online Object Tracking with Proposal Selection
* Opening the Black Box: Hierarchical Sampling Optimization for Estimating Human Hand Pose
* Optimizing Expected Intersection-Over-Union with Candidate-Constrained CRFs
* Optimizing the Viewing Graph for Structure-from-Motion
* Oriented Light-Field Windows for Scene Flow
* Oriented Object Proposals
* P-CNN: Pose-Based CNN Features for Action Recognition
* Pairwise Conditional Random Forests for Facial Expression Recognition
* Pan-Sharpening with a Hyper-Laplacian Penalty
* Panoptic Studio: A Massively Multiview System for Social Motion Capture
* Parsimonious Labeling
* Partial Person Re-Identification
* Patch Group Based Nonlocal Self-Similarity Prior Learning for Image Denoising
* PatchMatch-Based Automatic Lattice Detection for Near-Regular Textures
* Pedestrian Travel Time Estimation in Crowded Scenes
* Peeking Template Matching for Depth Extension
* Per-Sample Kernel Adaptation for Visual Recognition and Grouping
* Person Re-Identification Ranking Optimisation by Discriminant Context Information Analysis
* Person Re-Identification with Correspondence Structure Learning
* Person Re-Identification with Discriminatively Trained Viewpoint Invariant Dictionaries
* Person Recognition in Personal Photo Collections
* Personalized Age Progression with Aging Dictionary
* Photogeometric Scene Flow for High-Detail Dynamic 3D Reconstruction
* Photometric Stereo in a Scattering Medium
* Photometric Stereo with Small Angular Variations
* Piecewise Flat Embedding for Image Segmentation
* PIEFA: Personalized Incremental and Ensemble Face Alignment
* Point Triangulation through Polyhedron Collapse Using the L-inf Norm
* Polarized 3D: High-Quality Depth Sensing with Polarization Cues
* POP Image Fusion: Derivative Domain Image Fusion without Reintegration
* Pose Induction for Novel Object Categories
* Pose-Invariant 3D Face Alignment
* PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization
* PQTable: Fast Exact Asymmetric Distance Neighbor Search for Product Quantization Using Hash Tables
* Predicting Deep Zero-Shot Convolutional Neural Networks Using Textual Descriptions
* Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture
* Predicting Good Features for Image Geo-Localization Using Per-Bundle VLAD
* Predicting Multiple Structured Visual Interpretations
* Probabilistic Appearance Models for Segmentation and Classification
* Probabilistic Label Relation Graphs with Ising Models
* Procedural Editing of 3D Building Point Clouds
* Projection Bank: From High-Dimensional Data to Medium-Length Binary Codes
* Projection Free Method for Generalized Eigenvalue Problem with a Nonsmooth Regularizer, A
* Projection onto the Manifold of Elongated Structures for Accurate Extraction
* Query Adaptive Similarity Measure for RGB-D Object Recognition
* Randomized Ensemble Approach to Industrial CT Segmentation, A
* Real-Time Pose Estimation Piggybacked on Object Detection
* Realtime Edge-Based Visual Odometry for a Monocular Camera
* Recurrent Network Models for Human Dynamics
* Recursive Frichet Mean Computation on the Grassmannian and Its Applications to Computer Vision
* Reflection Modeling for Passive Stereo
* Registering Images to Untextured Geometry Using Average Shading Gradients
* Regressing a 3D Face Shape from a Single Image
* Regressive Tree Structured Model for Facial Landmark Localization
* Relaxed Multiple-Instance SVM with Application to Object Discovery
* Relaxing from Vocabulary: Robust Weakly-Supervised Deep Learning for Vocabulary-Free Image Tagging
* Removing Rain from a Single Image via Discriminative Sparse Coding
* Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views
* Rendering of Eyes for Eye-Shape Registration and Gaze Estimation
* Resolving Scale Ambiguity via XSlit Aspect Ratio Analysis
* RGB-Guided Hyperspectral Image Upsampling
* RGB-W: When Vision Meets Wireless
* RIDE: Reversal Invariant Descriptor Enhancement
* Robust and Optimal Sum-of-Squares-Based Point-to-Plane Registration of Image Sets and Structured Scenes
* Robust Facial Landmark Detection Under Significant Head Poses and Occlusion
* Robust Heart Rate Measurement from Video Using Select Random Patches
* Robust Image Segmentation Using Contour-Guided Color Palettes
* Robust Model-Based 3D Head Pose Estimation
* Robust Non-rigid Motion Tracking and Surface Reconstruction Using L_0 Regularization
* Robust Nonrigid Registration by Convex Optimization
* Robust Optimization for Deep Regression
* Robust Principal Component Analysis on Graphs
* Robust RGB-D Odometry Using Point and Line Features
* Robust Statistical Face Frontalization
* Rolling Shutter Super-Resolution
* SALICON: Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks
* Scalable Nonlinear Embeddings for Semantic Category-Based Image Retrieval
* Scalable Person Re-identification: A Benchmark
* Scene-Domain Active Part Models for Object Representation
* Secrets of GrabCut and Kernel K-Means
* Secrets of Matrix Factorization: Approximations, Numerics, Manifold Optimization and Random Restarts
* See the Difference: Direct Pre-Image Reconstruction and Pose Estimation by Differentiating HOG
* Segment Graph Based Image Filtering: Fast Structure-Preserving Smoothing
* Segment-Phrase Table for Semantic Segmentation, Visual Entailment and Paraphrasing
* Selecting Relevant Web Trained Concepts for Automated Event Retrieval
* Selective Encoding for Recognizing Unreliably Localized Faces
* Self-Calibration of Optical Lenses
* Self-Occlusions and Disocclusions in Causal Video Object Segmentation
* Self-Paced Multiple-Instance Learning Framework for Co-Saliency Detection, A
* Semantic Component Analysis
* Semantic Image Segmentation via Deep Parsing Network
* Semantic Pose Using Deep Networks Trained on Synthetic RGB-D
* Semantic Segmentation of RGBD Images with Mutex Constraints
* Semantic Segmentation with Object Clique Potential
* Semantic Video Entity Linking Based on Visual Content and Metadata
* Semantically-Aware Aerial Reconstruction from Multi-modal Data
* Semi-Supervised Normalized Cuts for Image Segmentation
* Semi-Supervised Zero-Shot Classification with Label Representation Learning
* Separating Fluorescent and Reflective Components by Using a Single Hyperspectral Image
* Sequence to Sequence -- Video to Text
* Shape Interaction Matrix Revisited and Robustified: Efficient Subspace Clustering with Corrupted and Incomplete Data
* Shell PCA: Statistical Shape Modelling in Shell Space
* Similarity Gaussian Process Latent Variable Model for Multi-modal Data Analysis
* Simpler Non-Parametric Methods Provide as Good or Better Results to Multiple-Instance Learning
* Simultaneous Deep Transfer Across Domains and Tasks
* Simultaneous Foreground Detection and Classification with Hybrid Features
* Simultaneous Local Binary Feature Learning and Encoding for Face Recognition
* Single Image 3D without a Single 3D Image
* Single Image Pop-Up from Discriminatively Learned Parts
* Single-Shot Specular Surface Reconstruction with Gonio-Plenoptic Imaging
* SOWP: Spatially Ordered and Weighted Patch Descriptor for Visual Tracking
* Sparse Dynamic 3D Reconstruction from Unsynchronized Videos
* Spatial Semantic Regularisation for Large Scale Object Detection
* Spatio-Temporal Appearance Representation for Video-Based Pedestrian Re-Identification, A
* SpeDo: 6 DOF Ego-Motion Sensor Using Speckle Defocus Imaging
* SPM-BP: Sped-Up PatchMatch Belief Propagation for Continuous MRFs
* Square Localization for Efficient and Accurate Object Detection
* StereoSnakes: Contour Based Consistent Object Extraction for Stereo Images
* Storyline Representation of Egocentric Videos with an Applications to Story-Based Search
* Structural Kernel Learning for Large Scale Multiclass Object Co-detection
* Structure from Motion Using Structure-Less Resection
* Structured Feature Selection
* Structured Indoor Modeling
* Supervised Low-Rank Method for Learning Invariant Subspaces, A
* Synthesizing Illumination Mosaics from Internet Photo-Collections
* Task-Driven Feature Pooling for Image Classification
* Temporal Perception and Prediction in Ego-Centric Video
* Temporal Subspace Clustering for Human Motion Segmentation
* Text Flow: A Unified Text Detection System in Natural Scene Images
* Thin Structure Estimation with Curvature Regularization
* Top Rank Supervised Binary Coding for Visual Search
* Towards Computational Baby Learning: A Weakly-Supervised Approach for Object Detection
* Towards Pointless Structure from Motion: 3D Reconstruction and Camera Parameters from General 3D Curves
* Tracking-by-Segmentation with Online Gradient Boosting Decision Tree
* Training a Feedback Loop for Hand Pose Estimation
* TransCut: Transparent Object Segmentation from a Light-Field Image
* TRIC-track: Tracking by Regression with Incrementally Learned Cascades
* Two Birds, One Stone: Jointly Learning Binary Code for Large-Scale Face Image Retrieval and Attributes Prediction
* Uncovering Interactions and Interactors: Joint Estimation of Head, Body Orientation and F-Formations from Surveillance Videos
* Understanding and Diagnosing Visual Tracking Systems
* Understanding and Predicting Image Memorability at a Large Scale
* Understanding Deep Features with Computer-Generated Imagery
* Understanding Everyday Hands in Action from RGB-D Images
* Unified Multiplicative Framework for Attribute Learning, A
* Unsupervised Cross-Modal Synthesis of Subject-Specific Scans
* Unsupervised Domain Adaptation for Zero-Shot Learning
* Unsupervised Domain Adaptation with Imbalanced Cross-Domain Data
* Unsupervised Extraction of Video Highlights via Robust Recurrent Auto-Encoders
* Unsupervised Generation of a View Point Annotated Car Dataset from Videos
* Unsupervised Learning of Spatiotemporally Coherent Metrics
* Unsupervised Learning of Visual Representations Using Videos
* Unsupervised Object Discovery and Tracking in Video Collections
* Unsupervised Semantic Parsing of Video Collections
* Unsupervised Synchrony Discovery in Human Interaction
* Unsupervised Trajectory Clustering via Adaptive Multi-kernel-Based Shrinkage
* Unsupervised Tube Extraction Using Transductive Learning and Dense Trajectories
* Unsupervised Visual Representation Learning by Context Prediction
* Variational Depth Superresolution Using Example-Based Edge Representations
* Variational PatchMatch MultiView Reconstruction and Refinement
* Versatile Learning-Based 3D Temporal Tracker: Scalable, Robust, Online, A
* Versatile Scene Model with Differentiable Visibility Applied to Generative Pose Estimation, A
* Video Matting via Sparse and Low-Rank Representation
* Video Restoration Against Yin-Yang Phasing
* Video Segmentation with Just a Few Strokes
* Video Super-Resolution via Deep Draft-Ensemble Learning
* Visual Madlibs: Fill in the Blank Description Generation and Question Answering
* Visual Phrases for Exemplar Face Detection
* Visual Tracking with Fully Convolutional Networks
* Volumetric Bias in Segmentation and Reconstruction: Secrets and Solutions
* VQA: Visual Question Answering
* Wavefront Marching Method for Solving the Eikonal Equation on Cartesian Grids, A
* Weakly Supervised Graph Based Semantic Segmentation by Learning Communities of Image-Parts
* Weakly-and Semi-Supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation
* Weakly-Supervised Alignment of Video with Text
* Weakly-Supervised Structured Output Learning with Flexible and Latent Graphs Using High-Order Loss Functions
* Web-Scale Image Clustering Revisited
* Webly Supervised Learning of Convolutional Networks
* What Makes an Object Memorable?
* What Makes Tom Hanks Look Like Tom Hanks
* Where to Buy It: Matching Street Clothing Photos in Online Shops
* Wide Baseline Stereo Matching with Convex Bounded Distortion Constraints
* Wide-Area Image Geolocalization with Aerial Reference Imagery
* You are Here: Mimicking the Human Thinking Process in Reading Floor-Plans
* Zero-Shot Learning via Semantic Similarity Embedding
* 2D-Driven 3D Object Detection in RGB-D Images
* 3D Graph Neural Networks for RGBD Semantic Segmentation
* 3D Morphable Model of Craniofacial Shape and Texture Variation, A
* 3D Surface Detail Enhancement from a Single Normal Map
* 3D-PRNN: Generating Shape Primitives with Recurrent Neural Networks
* 3DCNN-DQN-RNN: A Deep Reinforcement Learning Framework for Semantic Parsing of Large-Scale 3D Point Clouds
* Action Tubelet Detector for Spatio-Temporal Action Localization
* Active Decision Boundary Annotation with Deep Generative Models
* Active Learning for Human Pose Estimation
* Adaptive Feeding: Achieving Fast and Accurate Detections by Adaptively Combining Object Detectors
* Adaptive RNN Tree for Large-Scale Human Action Recognition
* Adversarial Examples Detection in Deep Networks with Convolutional Filter Statistics
* Adversarial Examples for Semantic Segmentation and Object Detection
* Adversarial Image Perturbation for Privacy Protection A Game Theory Perspective
* Adversarial PoseNet: A Structure-Aware Convolutional Network for Human Pose Estimation
* Aesthetic Critiques Generation for Photos
* Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks
* Am I a Baller? Basketball Performance Assessment from First-Person Videos
* AMAT: Medial Axis Transform for Natural Images
* AMTnet: Action-Micro-Tube Regression by End-to-end Trainable Deep Architecture
* Amulet: Aggregating Multi-level Convolutional Features for Salient Object Detection
* Analysis of Visual Question Answering Algorithms, An
* Anchored Regression Networks Applied to Age Estimation and Super Resolution
* AnnArbor: Approximate Nearest Neighbors Using Arborescence Coding
* Anticipating Daily Intention Using On-wrist Motion Triggered Sensing
* AOD-Net: All-in-One Dehazing Network
* Approximate Grassmannian Intersections: Subspace-Valued Subspace Learning
* Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization
* Areas of Attention for Image Captioning
* Associative Domain Adaptation
* Attention-Aware Deep Reinforcement Learning for Video Face Recognition
* Attention-Based Multimodal Fusion for Video Description
* Attentive Semantic Video Generation Using Captions
* Attribute Recognition by Joint Recurrent Learning of Context and Correlation
* Attribute-Enhanced Face Recognition with Neural Tensor Fusion Networks
* Attributes2Classname: A Discriminative Model for Attribute-Based Unsupervised Zero-Shot Learning
* AutoDIAL: Automatic Domain Alignment Layers
* Automatic Content-Aware Projection for 360° Videos
* Automatic Spatially-Aware Fashion Concept Discovery
* BAM! The Behance Artistic Media Dataset for Recognition Beyond Photography
* BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth
* Be Your Own Prada: Fashion Synthesis with Structural Coherence
* Benchmarking and Error Diagnosis in Multi-instance Pose Estimation
* Benchmarking Single-Image Reflection Removal Algorithms
* Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis
* Beyond Planar Symmetry: Modeling Human Perception of Reflection and Rotation Symmetries in the Wild
* Beyond Standard Benchmarks: Parameterizing Performance Evaluation in Visual Object Tracking
* BIER: Boosting Independent Embeddings Robustly
* Binarized Convolutional Landmark Localizers for Human Pose Estimation and Face Alignment with Limited Resources
* Blind Image Deblurring with Outlier Handling
* BlitzNet: A Real-Time Deep Network for Scene Understanding
* Blob Reconstruction Using Unilateral Second Order Gaussian Kernels with Application to High-ISO Long-Exposure Image Denoising
* Blur-Invariant Deep Learning for Blind-Deblurring
* BodyFusion: Real-Time Capture of Human Motion and Surface Geometry Using a Single Depth Camera
* Boosting Image Captioning with Attributes
* Bounding Boxes, Segmentations and Object Coordinates: How Important is Recognition for 3D Scene Flow Estimation in Autonomous Driving Scenarios?
* Bringing Background into the Foreground: Making All Classes Equal in Weakly-Supervised Video Semantic Segmentation
* CAD Priors for Accurate and Flexible Instance Reconstruction
* Camera Calibration by Global Constraints on the Motion of Silhouettes
* Cascaded Feature Network for Semantic Segmentation of RGB-D Images
* Catadioptric HyperSpectral Light Field Imaging
* CDTS: Collaborative Detection, Tracking, and Segmentation for Online Multiple Object Segmentation in Videos
* Centered Weight Normalization in Accelerating Training of Deep Neural Networks
* Chained Cascade Network for Object Detection
* Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection
* Channel Pruning for Accelerating Very Deep Neural Networks
* Characterizing and Improving Stability in Neural Style Transfer
* ChromaTag: A Colored Marker and Fast Detection Algorithm
* Class Rectification Hard Mining for Imbalanced Deep Learning
* Click Here: Human-Localized Keypoints as Guidance for Viewpoint Estimation
* Coarse-Fine Network for Keypoint Localization, A
* Coherent Online Video Style Transfer
* Colored Point Cloud Registration Revisited
* Common Action Discovery and Localization in Unconstrained Videos
* Complex Event Detection by Identifying Reliable Shots from Untrimmed Videos
* Composite Focus Measure for High Quality Depth Maps
* Compositional Human Pose Regression
* Compressive Quantization for Fast Object Instance Search in Videos
* Consensus Convolutional Sparse Coding
* Constrained Convolutional Sparse Coding for Parametric Based Reconstruction of Line Drawings
* Convergence Analysis of MAP Based Blur Kernel Estimation
* Convolutional Dictionary Learning via Local Processing
* Coordinating Filters for Faster Deep Neural Networks
* Corner-Based Geometric Calibration of Multi-focus Plenoptic Cameras
* CoupleNet: Coupling Global Structure with Local Parts for Object Detection
* CREST: Convolutional Residual Learning for Visual Tracking
* Cross-Modal Deep Variational Hashing
* Cross-View Asymmetric Metric Learning for Unsupervised Person Re-Identification
* Curriculum Domain Adaptation for Semantic Segmentation of Urban Scenes
* Curriculum Dropout
* Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection
* CVAE-GAN: Fine-Grained Image Generation through Asymmetric Training
* DCTM: Discrete-Continuous Transformation Matching for Semantic Flow
* Decoder Network over Lightweight Reconstructed Feature for Fast Semantic Style Transfer
* Deep Adaptive Image Clustering
* Deep Binaries: Encoding Semantic-Rich Cues for Efficient Textual-Visual Cross Retrieval
* Deep Clustering via Joint Convolutional Autoencoder Embedding and Relative Entropy Minimization
* Deep Cropping via Attention Box Prediction and Aesthetics Assessment
* Deep Determinantal Point Process for Large-Scale Multi-label Classification
* Deep Direct Regression for Multi-oriented Scene Text Detection
* Deep Dual Learning for Semantic Image Segmentation
* Deep Facial Action Unit Recognition from Partially Labeled Data
* Deep Free-Form Deformation Network for Object-Mask Registration
* Deep Functional Maps: Structured Prediction for Dense Shape Correspondence
* Deep Generative Adversarial Compression Artifact Removal
* Deep Globally Constrained MRFs for Human Pose Estimation
* Deep Growing Learning
* Deep Metric Learning with Angular Loss
* Deep Occlusion Reasoning for Multi-camera Multi-target Detection
* Deep Scene Image Classification with the MFAFVNet
* Deep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval
* Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework
* DeepCD: Learning Deep Complementary Descriptors for Patch Representations
* DeepCoder: Semi-Parametric Variational Autoencoders for Automatic Facial Action Coding
* DeepContext: Context-Encoding Neural Pathways for 3D Holistic Scene Understanding
* Deeper, Broader and Artier Domain Generalization
* DeepFuse: A Deep Unsupervised Approach for Exposure Fusion with Extreme Exposure Image Pairs
* Deeply-Learned Part-Aligned Representations for Person Re-identification
* DeepRoadMapper: Extracting Road Topology from Aerial Images
* DeepSetNet: Predicting Sets with Deep Neural Networks
* Deformable Convolutional Networks
* Deltille Grids for Geometric Camera Calibration
* Delving into Salient Object Subitizing and Detection
* DeNet: Scalable Real-Time Object Detection with Directed Sparse Sampling
* Dense and Low-Rank Gaussian CRFs Using Deep Embeddings
* Dense Non-rigid Structure-from-Motion and Shading with Unknown Albedos
* Dense-Captioning Events in Videos
* Depth and Image Restoration from Light Field in a Scattering Medium
* Depth Estimation Using Structured Light Flow: Analysis of Projected Pattern Flow on an Object's Surface
* Detail-Revealing Deep Video Super-Resolution
* Detailed Surface Geometry and Albedo Recovery from RGB-D Video under Natural Illumination
* Detect to Track and Track to Detect
* Detecting Faces Using Inside Cascaded Contextual CNN
* Directionally Convolutional Networks for 3D Shape Segmentation
* Discriminative View of MRF Pre-processing Algorithms, A
* Distributed Very Large Scale Bundle Adjustment by Global Camera Consensus
* Domain-Adaptive Deep Network Compression
* Drone-Based Object Counting by Spatially Regularized Regional Proposal Network
* DSLR-Quality Photos on Mobile Devices with Deep Convolutional Networks
* DSOD: Learning Deeply Supervised Object Detectors from Scratch
* Dual Motion GAN for Future-Flow Embedded Video Prediction
* Dual-Glance Model for Deciphering Social Relationships
* DualGAN: Unsupervised Dual Learning for Image-to-Image Translation
* DualNet: Learn Complementary Features for Image Recognition
* Dynamic Label Graph Matching for Unsupervised Video Re-identification
* Dynamics Enhanced Multi-camera Motion Segmentation from Unsynchronized Videos
* Editable Parametric Dense Foliage from 3D Capture
* Efficient Algorithms for Moral Lineage Tracing
* Efficient Global 2D-3D Matching for Camera Localization in a Large-Scale 3D Map
* Efficient Global Illumination for Morphable Models
* Efficient Low Rank Tensor Ring Completion
* Efficient Online Local Metric Adaptation via Negative Samples for Person Re-identification
* Egocentric Gesture Recognition Using Recurrent 3D Convolutional Neural Networks with Spatiotemporal Transformer Modules
* Embedding 3D Geometric Features for Rigid Object Part Segmentation
* Empirical Study of Language CNN for Image Captioning, An
* Encoder Based Lifelong Learning
* Encouraging LSTMs to Anticipate Actions Very Early
* End-to-End Face Detection and Cast Grouping in Movies Using Erdos-Renyi Clustering
* End-to-End Learning of Geometry and Context for Deep Stereo Regression
* EnhanceNet: Single Image Super-Resolution Through Automated Texture Synthesis
* Ensemble Deep Learning for Skeleton-Based Action Recognition Using Temporal Sliding LSTM Networks
* Ensemble Diffusion for Retrieval
* Escape from Cells: Deep Kd-Networks for the Recognition of 3D Point Cloud Models
* Estimating Defocus Blur via Rank of Local Patches
* Exploiting Multi-grain Ranking Constraints for Precisely Searching Visually-similar Vehicles
* Exploiting Spatial Structure for Localizing Manipulated Image Regions
* Extreme Clicking for Efficient Object Annotation
* Face Sketch Matching via Coupled Deep Transform Learning
* Factorized Bilinear Models for Image Recognition
* Fashion Forward: Forecasting Visual Style in Fashion
* Fast Face-Swap Using Convolutional Neural Networks
* Fast Image Processing with Fully-Convolutional Networks
* Fast Multi-image Matching via Density-Based Clustering
* Faster than Real-Time Facial Alignment: A 3D Spatial Transformer Network Approach in Unconstrained Poses
* FCN-rLSTM: Deep Spatio-Temporal Neural Networks for Vehicle Counting in City Cameras
* Filter Selection for Hyperspectral Estimation
* Fine-Grained Recognition in the Wild: A Multi-task Domain Adaptation Approach
* First-Person Activity Forecasting with Online Inverse Reinforcement Learning
* FLaME: Fast Lightweight Mesh Estimation Using Variational Smoothing on Delaunay Graphs
* Flip-Invariant Motion Representation
* Flow-Guided Feature Aggregation for Video Object Detection
* Focal Loss for Dense Object Detection
* Focal Track: Depth and Accommodation with Oscillating Lens Deformation
* Focusing Attention: Towards Accurate Text Recognition in Natural Images
* Following Gaze in Video
* FoveaNet: Perspective-Aware Urban Scene Parsing
* From Point Clouds to Mesh Using Regression
* From RGB to Spectrum for Natural Scenes via Manifold-Based Mapping
* From Square Pieces to Brick Walls: The Next Challenge in Solving Jigsaw Puzzles
* GANs for Biological Image Synthesis
* Generalized Orderless Pooling Performs Implicit Salient Matching
* Generating High-Quality Crowd Density Maps Using Contextual Pyramid CNNs
* Generative Adversarial Networks Conditioned by Brain Signals
* Generative Model of People in Clothing, A
* Generative Modeling of Audible Shapes for Object Perception
* Generic Deep Architecture for Single Image Reflection Removal and Image Smoothing, A
* Genetic CNN
* Geometric Framework for Statistical Analysis of Trajectories with Distinct Temporal Spans, A
* Globally-Optimal Inlier Set Maximisation for Simultaneous Camera Pose and Feature Correspondence
* Going Unconstrained with Rolling Shutter Deblurring
* GPLAC: Generalizing Vision-Based Robotic Skills Using Weakly Labeled Images
* Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization
* Group Re-identification via Unsupervised Transfer of Sparse Features Encoding
* Guided Perturbations: Self-Corrective Behavior in Convolutional Neural Networks
* Hard-Aware Deeply Cascaded Embedding
* HashNet: Deep Learning to Hash by Continuation
* Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-Supervised Object and Action Localization
* Hierarchical Multimodal LSTM for Dense Visual-Semantic Embedding
* High Order Tensor Formulation for Convolutional Sparse Coding
* High-Quality Correspondence and Segmentation Estimation for Dual-Lens Smart-Phone Portraits
* High-Resolution Shape Completion Using Deep Neural Networks for Global Structure and Local Geometry Inference
* Higher-Order Integration of Hierarchical Convolutional Activations for Fine-Grained Visual Categorization
* Higher-Order Minimum Cost Lifted Multicuts for Motion Segmentation
* How Far are We from Solving the 2D 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks)
* Human Pose Estimation Using Global and Local Normalization
* HydraPlus-Net: Attentive Deep Features for Pedestrian Analysis
* Identity-Aware Textual-Visual Matching with Latent Co-attention
* Illuminating Pedestrians via Simultaneous Detection and Segmentation
* Image Super-Resolution Using Dense Skip Connections
* Image-Based Localization Using LSTMs for Structured Feature Correlation
* Image2song: Song Retrieval via Bridging Image Content and Lyric Words
* Improved Image Captioning via Policy Gradient optimization of SPIDEr
* Increasing CNN Robustness to Occlusions by Reducing Filter Support
* Incremental Learning of Object Detectors without Catastrophic Forgetting
* Infant Footprint Recognition
* Inferring and Executing Programs for Visual Reasoning
* Infinite Latent Feature Selection: A Probabilistic Latent Graph-Based Ranking Approach
* Interleaved Group Convolutions
* Interpretable Explanations of Black Boxes by Meaningful Perturbation
* Interpretable Learning for Self-Driving Cars by Visualizing Causal Attention
* Interpretable Transformations with Encoder-Decoder Networks
* Intrinsic 3D Dynamic Surface Tracking based on Dynamic Ricci Flow and Teichmuller Map
* Intrinsic3D: High-Quality 3D Reconstruction by Joint Appearance and Geometry Optimization with Spatially-Varying Lighting
* Introspective Neural Networks for Generative Modeling
* Is Second-Order Information Helpful for Large-Scale Visual Recognition?
* Joint Adaptive Sparsity and Low-Rankness on the Fly: An Online Tensor Reconstruction Scheme for Video Denoising
* Joint Bi-layer Optimization for Single-Image Rain Streak Removal
* Joint Convolutional Analysis and Synthesis Sparse Representation for Single Image Layer Separation
* Joint Detection and Recounting of Abnormal Events by Learning Deep Generic Knowledge
* Joint Discovery of Object States and Manipulation Actions
* Joint Estimation of Camera Pose, Depth, Deblurring, and Super-Resolution from a Blurred Image Sequence
* Joint Intrinsic-Extrinsic Prior Model for Retinex, A
* Joint Layout Estimation and Global Multi-view Registration for Indoor Reconstruction
* Joint Learning of Object and Action Detectors
* Joint Prediction of Activity Labels and Starting Times in Untrimmed Videos
* Jointly Attentive Spatial-Temporal Pooling Networks for Video-Based Person Re-identification
* Jointly Recognizing Object Fluents and Tasks in Egocentric Videos
* Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression
* Large-Scale Image Retrieval with Attentive Deep Local Features
* Lattice Long Short-Term Memory for Human Action Recognition
* Learned Multi-patch Similarity
* Learned Watershed: End-to-End Learning of Seeded Segmentation
* Learning 3D Object Categories by Looking Around Them
* Learning a Recurrent Residual Fusion Network for Multimodal Matching
* Learning Action Recognition Model from Depth and Skeleton Videos
* Learning Background-Aware Correlation Filters for Visual Tracking
* Learning Bag-of-Features Pooling for Deep Convolutional Neural Networks
* Learning Blind Motion Deblurring
* Learning Compact Geometric Features
* Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning
* Learning Deep Neural Networks for Vehicle Re-ID with Visual-spatio-Temporal Path Proposals
* Learning Dense Facial Correspondences in Unconstrained Images
* Learning Discriminative Aggregation Network for Video-Based Face Recognition
* Learning Discriminative alpha-beta-Divergences for Positive Definite Matrices
* Learning Discriminative Data Fitting Functions for Blind Image Deblurring
* Learning Discriminative Latent Attributes for Zero-Shot Classification
* Learning Dynamic Siamese Network for Visual Object Tracking
* Learning Efficient Convolutional Networks through Network Slimming
* Learning Feature Pyramids for Human Pose Estimation
* Learning for Active 3D Mapping
* Learning from Noisy Labels with Distillation
* Learning from Video and Text via Large-Scale Discriminative Clustering
* Learning Gaze Transitions from Depth to Improve Video Saliency Estimation
* Learning Hand Articulations by Hallucinating Heat Distribution
* Learning High Dynamic Range from Outdoor Panoramas
* Learning in an Uncertain World: Representing Ambiguity Through Multiple Hypotheses
* Learning Long-Term Dependencies for Action Recognition with a Biologically-Inspired Deep Network
* Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition
* Learning Policies for Adaptive Tracking with Deep Feature Cascades
* Learning Proximal Operators: Using Denoising Networks for Regularizing Inverse Imaging Problems
* Learning Robust Visual-Semantic Embeddings
* Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks
* Learning Spread-Out Local Feature Descriptors
* Learning the Latent Look: Unsupervised Discovery of a Style-Coherent Embedding from Fashion Images
* Learning to Disambiguate by Asking Discriminative Questions
* Learning to Estimate 3D Hand Pose from Single RGB Images
* Learning to Fuse 2D and 3D Image Cues for Monocular Body Pose Estimation
* Learning to Push the Limits of Efficient FFT-Based Image Deconvolution
* Learning to Reason: End-to-End Module Networks for Visual Question Answering
* Learning to Super-Resolve Blurry Face and Text Images
* Learning to Synthesize a 4D RGBD Light Field from a Single Image
* Learning Uncertain Convolutional Features for Accurate Saliency Detection
* Learning Video Object Segmentation with Visual Memory
* Learning View-Invariant Features for Person Identification in Temporally Synchronized Videos Taken by Wearable Cameras
* Learning Visual Attention to Identify People with Autism Spectrum Disorder
* Learning Visual N-Grams from Web Data
* Learning-Based Cloth Material Recovery from Video
* Least Squares Generative Adversarial Networks
* Leveraging Weak Semantic Relevance for Complex Video Event Classification
* Lightweight Approach for On-the-Fly Reflectance Estimation, A
* Lightweight Single-Camera Polarization Compass with Covariance Estimation, A
* Linear Differential Constraints for Photo-Polarimetric Height Estimation
* Local-to-Global Point Cloud Registration Using a Dictionary of Viewpoint Descriptors
* Localizing Moments in Video with Natural Language
* Locally-Transferred Fisher Vectors for Texture Classification
* Long Short-Term Memory Kalman Filters: Recurrent Neural Estimators for Pose Regularization
* Look, Listen and Learn
* Look, Perceive and Segment: Finding the Salient Objects in Images via Two-stream Fixation-Semantic CNNs
* Low Compute and Fully Parallel Computer Vision with HashMatch
* Low-Dimensionality Calibration through Local Anisotropic Scaling for Robust Hand Model Personalization
* Low-Rank Tensor Completion: A Pseudo-Bayesian Learning Approach
* Low-Shot Visual Recognition by Shrinking and Hallucinating Features
* Makeup-Go: Blind Reversion of Portrait Edit
* Making Minimal Solvers for Absolute Pose Estimation Compact and Robust
* Mapillary Vistas Dataset for Semantic Understanding of Street Scenes, The
* MarioQA: Answering Questions by Watching Gameplay Videos
* Mask R-CNN
* Material Editing Using a Physically Based Rendering Network
* Maximizing Rigidity Revisited: A Convex Programming Approach for Generic 3D Shape Reconstruction from Multiple Perspective Views
* MemNet: A Persistent Memory Network for Image Restoration
* Microfacet-Based Reflectance Model for Photometric Stereo with Highly Specular Surfaces, A
* MIHash: Online Hashing with Mutual Information
* MirrorFlow: Exploiting Symmetries in Joint Optical Flow and Occlusion Estimation
* Misalignment-Robust Joint Filter for Cross-Modal Image Pairs
* Modeling Urban Scenes from Pointclouds
* Modelling the Scene Dependent Imaging in Cameras with a Deep Neural Network
* MoFA: Model-Based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction
* Monocular 3D Human Pose Estimation by Predicting Depth on Joints
* Monocular Dense 3D Reconstruction of a Complex Dynamic Scene from Two Perspective Frames
* Monocular Free-Head 3D Gaze Tracking with Deep Learning and Geometry Constraints
* Monocular Video-Based Trailer Coupler Detection Using Multiplexer Convolutional Neural Network
* Moving Object Detection in Time-Lapse or Motion Trigger Image Sequences Using Low-Rank and Invariant Sparse Decomposition
* Multi-channel Weighted Nuclear Norm Minimization for Real Color Image Denoising
* Multi-label Image Recognition by Recurrently Discovering Attentional Regions
* Multi-label Learning of Part Detectors for Heavily Occluded Pedestrian Detection
* Multi-modal Factorized Bilinear Pooling with Co-attention Learning for Visual Question Answering
* Multi-scale Deep Learning Architectures for Person Re-identification
* Multi-stage Multi-recursive-input Fully Convolutional Networks for Neuronal Boundary Detection
* Multi-task Self-Supervised Visual Learning
* Multi-view Dynamic Shape Refinement Using Local Temporal Integration
* Multi-view Non-rigid Refinement and Normal Selection for High Quality 3D Reconstruction
* Multilayer-Based Framework for Online Background Subtraction with Freely Moving Cameras, A
* Multimodal Deep Regression Bayesian Network for Affective Video Content Analyses, A
* Multimodal Gaussian Process Latent Variable Models with Harmonization
* MUTAN: Multimodal Tucker Fusion for Visual Question Answering
* Mutual Enhancement for Detection of Multiple Logos in Sports Videos
* Need for Speed: A Benchmark for Higher Frame Rate Object Tracking
* Neural Ctrl-F: Segmentation-Free Query-by-String Word Spotting in Handwritten Manuscript Collections
* Neural EPI-Volume Networks for Shape from Light Field
* Neural Person Search Machines
* No Fuss Distance Metric Learning Using Proxies
* No More Discrimination: Cross City Adaptation of Road Scene Segmenters
* Non-convex Rank/Sparsity Regularization and Local Minima
* Non-linear Convolution Filters for CNN-Based Learning
* Non-Markovian Globally Consistent Multi-object Tracking
* Non-rigid Object Tracking via Deformable Patches Using Shape-Preserved KCF and Level Sets
* Non-uniform Blind Deblurring by Reblurring
* Nonparametric Variational Auto-Encoders for Hierarchical Representation Learning
* Novel Space-Time Representation on the Positive Semidefinite Cone for Facial Expression Recognition, A
* Object-Level Proposals
* Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs
* Offline Handwritten Signature Modeling and Verification Based on Archetypal Analysis
* On-demand Learning for Deep Image Restoration
* One Network to Solve Them All: Solving Linear Inverse Problems Using Deep Projection Models
* Online Multi-object Tracking Using CNN-Based Single Object Tracker with Spatial-Temporal Attention Mechanism
* Online Real-Time Multiple Spatiotemporal Action Localisation and Prediction
* Online Robust Image Alignment via Subspace Learning from Gradient Orientations
* Online Video Deblurring via Dynamic Temporal Blending Network
* Online Video Object Detection Using Association LSTM
* Open Set Domain Adaptation
* Open Vocabulary Scene Parsing
* Optimal Transformation Estimation with Semantic Cues
* Optimal Transportation Based Univariate Neuroimaging Index, An
* Orientation Invariant Feature Embedding and Spatial Temporal Regularization for Vehicle Re-identification
* PanNet: A Deep Network Architecture for Pan-Sharpening
* Parallel Tracking and Verifying: A Framework for Real-Time and High Accuracy Visual Tracking
* Parameter-Free Lens Distortion Calibration of Central Cameras
* PathTrack: Fast Trajectory Annotation with Path Supervision
* Paying Attention to Descriptions Generated by Image Captioning Models
* Performance Guaranteed Network Acceleration via High-Order Residual Quantization
* Personalized Cinemagraphs Using Semantic Understanding and Collaborative Learning
* Personalized Image Aesthetics
* Photographic Image Synthesis with Cascaded Refinement Networks
* Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues
* Pixel Recursive Super Resolution
* Pixel-Level Matching for Video Object Segmentation Using Convolutional Neural Networks
* Playing for Benchmarks
* Point Set Registration with Global-Local Correspondence and Transformation Estimation
* PolyFit: Polygonal Surface Reconstruction from Point Clouds
* Polynomial Solvers for Saturated Ideals
* Pose Guided RGBD Feature Learning for 3D Object Pose Estimation
* Pose Knows: Video Forecasting by Generating Pose Futures, The
* Pose-Driven Deep Convolutional Model for Person Re-identification
* Pose-Invariant Face Alignment with a Single CNN
* PPR-FCN: Weakly Supervised Visual Relation Detection via Parallel Pairwise R-FCN
* Practical and Efficient Multi-view Matching
* Practical Projective Structure from Motion (P2SfM)
* Predicting Deeper into the Future of Semantic Segmentation
* Predicting Human Activities Using Stochastic Grammar
* Predicting Visual Exemplars of Unseen Classes for Zero-Shot Learning
* Predictor Combination at Test Time
* Primary Video Object Segmentation via Complementary CNNs and Neighborhood Reversible Flow
* Privacy-Preserving Visual Learning Using Doubly Permuted Homomorphic Encryption
* Probabilistic Structure from Motion with Objects (PSfMO)
* ProbFlow: Joint Optical Flow and Uncertainty Estimation
* Progressive Large Scale-Invariant Image Matching in Scale Space
* PUnDA: Probabilistic Unsupervised Domain Adaptation for Knowledge Transfer Across Visual Categories
* Quantitative Evaluation of Confidence Measures in a Machine Learning World
* Quasiconvex Plane Sweep for Triangulation with Outliers
* Query-Guided Regression Network with Context Policy for Phrase Grounding
* R-C3D: Region Convolutional 3D Network for Temporal Activity Detection
* Range Loss for Deep Face Recognition with Long-Tailed Training Data
* RankIQA: Learning from Rankings for No-Reference Image Quality Assessment
* Raster-to-Vector: Revisiting Floorplan Transformation
* Ray Space Features for Plenoptic Structure-from-Motion
* RDFNet: RGB-D Multi-level Residual Feature Fusion for Indoor Semantic Segmentation
* Read-Write Memory Network for Movie Story Understanding, A
* Real Time Eye Gaze Tracking with 3D Deformable Eye-Face Model
* Real-Time Hand Tracking under Occlusion from an Egocentric RGB-D Sensor
* Real-Time Monocular Pose Estimation of 3D Objects Using Temporally Consistent Local Color Histograms
* Realistic Dynamic Facial Textures from a Single Image Using GANs
* Reasoning About Fine-Grained Attribute Phrases Using Reference Games
* Recognition of Action Units in the Wild with Deep Nets and a New Global-Local Loss
* Reconfiguring the Imaging Pipeline for Computer Vision
* Reconstruction-Based Disentanglement for Pose-Invariant Face Recognition
* Recurrent 3D-2D Dual Learning for Large-Pose Facial Landmark Detection
* Recurrent Color Constancy
* Recurrent Models for Situation Recognition
* Recurrent Multimodal Interaction for Referring Image Segmentation
* Recurrent Scale Approximation for Object Detection in CNN
* Recurrent Topic-Transition GAN for Visual Paragraph Generation
* Recursive Spatial Transformer (ReST) for Alignment-Free Face Recognition
* Referring Expression Generation and Comprehension via Attributes
* Reflectance Capture Using Univariate Sampling of BRDFs
* Refractive Structure-from-Motion Through a Flat Refractive Interface
* Region-Based Correspondence Between 3D Shapes via Spatially Smooth Biclustering
* Regional Interactive Image Segmentation Networks
* Representation Learning by Learning to Count
* Rethinking Reprojection: Closing the Loop for Pose-Aware Shape Reconstruction from a Single Image
* Revisit of Sparse Coding Based Anomaly Detection in Stacked RNN Framework, A
* Revisiting Cross-Channel Information Transfer for Chromatic Aberration Correction
* Revisiting IM2GPS in the Deep Learning Era
* Revisiting Unreasonable Effectiveness of Data in Deep Learning Era
* RGB-Infrared Cross-Modality Person Re-identification
* RMPE: Regional Multi-person Pose Estimation
* Robust Hand Pose Estimation during the Interaction with an Unknown Object
* Robust Kronecker-Decomposable Component Analysis for Low-Rank Modeling
* Robust Object Tracking Based on Temporal and Spatial Deep Networks
* Robust Pseudo Random Fields for Light-Field Stereo Matching
* Robust Video Super-Resolution with Learned Temporal Dynamics
* Rolling Shutter Correction in Manhattan World
* Rolling-Shutter-Aware Differential SfM and Image Rectification
* RoomNet: End-to-End Room Layout Estimation
* Rotation Equivariant Vector Field Networks
* Rotational Subgroup Voting and Pose Clustering for Robust 3D Object Recognition
* RPAN: An End-to-End Recurrent Pose-Attention Network for Action Recognition in Videos
* SafetyNet: Detecting and Rejecting Adversarial Examples Robustly
* Saliency Pattern Detection by Ranking Structured Trees
* Sampling Matters in Deep Embedding Learning
* SBGAR: Semantics Based Group Activity Recognition
* Scale Recovery for Monocular Visual Odometry Using Depth Estimated with Deep Convolutional Neural Fields
* Scale-Adaptive Convolutions for Scene Parsing
* ScaleNet: Guiding Object Proposal Generation in Supermarkets and Beyond
* Scaling the Scattering Transform: Deep Hybrid Networks
* Scene Categorization with Spectral Features
* Scene Graph Generation from Objects, Phrases and Region Captions
* Scene Parsing with Global Context Embedding
* SceneNet RGB-D: Can 5M Synthetic Images Beat Generic ImageNet Pre-training on Indoor Segmentation?
* SCNet: Learning Semantic Correspondence
* See the Glass Half Full: Reasoning About Liquid Containers, Their Volume and Content
* SegFlow: Joint Learning for Video Object Segmentation and Optical Flow
* Segmentation-Aware Convolutional Networks Using Local Attention Masks
* Self-Balanced Min-Cut Algorithm for Image Clustering, A
* Self-Organized Text Detection with Minimal Post-processing via Border Learning
* Self-Paced Kernel Estimation for Robust Blind Image Deblurring
* Self-Supervised Learning of Pose Embeddings from Spatiotemporal Relations in Videos
* Semantic Image Synthesis via Adversarial Learning
* Semantic Jitter: Dense Supervision for Visual Comparisons via Synthetic Images
* Semantic Line Detection and Its Applications
* Semantic Video CNNs Through Representation Warping
* Semantically Informed Multiview Surface Refinement
* Semi Supervised Semantic Segmentation Using Generative Adversarial Network
* Semi-Global Weighted Least Squares in Image Filtering
* SGN: Sequential Grouping Networks for Instance Segmentation
* Shadow Detection with Conditional Generative Adversarial Networks
* Shape Inpainting Using 3D Generative Adversarial Network and Recurrent Convolutional Networks
* SHaPE: A Novel Graph Theoretic Algorithm for Making Consensus-Based Decisions in Person Re-identification Systems
* Should We Encode Rain Streaks in Video as Deterministic or Stochastic?
* Show, Adapt and Tell: Adversarial Training of Cross-Domain Image Captioner
* Side Information in Robust Principal Component Analysis: Algorithms and Applications
* Simple Yet Effective Baseline for 3d Human Pose Estimation, A
* Simultaneous Detection and Removal of High Altitude Clouds from an Image
* Single Image Action Recognition Using Semantic Body Part Actions
* Single Shot Text Detector with Regional Attention
* Situation Recognition with Graph Neural Networks
* Sketching with Style: Visual Search with Sketches and Aesthetic Context
* Smart Mining for Deep Metric Learning
* Soft Proposal Networks for Weakly Supervised Object Localization
* Soft-NMS: Improving Object Detection with One Line of Code
* Something Something Video Database for Learning and Evaluating Visual Common Sense, The
* SORT: Second-Order Response Transform for Visual Recognition
* Space-Time Localization and Mapping
* Sparse Exact PGA on Riemannian Manifolds
* Spatial Memory for Context Reasoning in Object Detection
* Spatial-Aware Object Embeddings for Zero-Shot Localization and Classification of Actions
* Spatio-Temporal Person Retrieval via Natural Language Queries
* Spatiotemporal Modeling for Crowd Counting in Videos
* Spatiotemporal Oriented Energy Network for Dynamic Texture Recognition, A
* Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training
* SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again
* SSH: Single Stage Headless Face Detector
* StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks
* Stagewise Refinement Model for Detecting Salient Objects in Images, A
* Stepwise Metric Promotion for Unsupervised Video Person Re-identification
* Stereo DSO: Large-Scale Direct Sparse Visual Odometry with Stereo Cameras
* Structure-Measure: A New Way to Evaluate Foreground Maps
* Structured Attentions for Visual Question Answering
* SuBiC: A Supervised, Structured Binary Code for Image Search
* Sublabel-Accurate Discretization of Nonconvex Free-Discontinuity Problems
* Submodular Trajectory Optimization for Aerial 3D Scanning
* SubUNets: End-to-End Hand Shape and Continuous Sign Language Recognition
* Summarization and Classification of Wearable Camera Streams by Learning the Distributions over Deep Features of Out-of-Sample Image Sequences
* Super-Trajectory for Video Segmentation
* Supervision by Fusion: Towards Unsupervised Learning of Deep Salient Object Detector
* Supplementary Meta-Learning: Towards a Dynamic Model for Deep Neural Networks
* Surface Normals in the Wild
* Surface Registration via Foliation
* SurfaceNet: An End-to-End 3D Neural Network for Multiview Stereopsis
* SVDNet for Pedestrian Retrieval
* Synergy between Face Alignment and Tracking via Discriminative Global Consensus Optimization
* S^3FD: Single Shot Scale-Invariant Face Detector
* Taking the Scenic Route to 3D: Optimising Reconstruction from Moving Cameras
* TALL: Temporal Activity Localization via Language Query
* Temporal Action Detection with Structured Segment Networks
* Temporal Context Network for Activity Localization in Videos
* Temporal Dynamic Graph LSTM for Action-Driven Video Object Detection
* Temporal Generative Adversarial Nets with Singular Value Clipping
* Temporal Non-volume Preserving Approach to Facial Age-Progression and Age-Invariant Face Recognition
* Temporal Shape Super-Resolution by Intra-frame Motion Encoding Using High-fps Structured Light
* Temporal Superpixels Based on Proximity-Weighted Patch Matching
* Temporal Tessellation: A Unified Approach for Video Analysis
* Tensor RPCA by Bayesian CP Factorization with Complex Noise
* ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression
* TORNADO: A Spatio-Temporal Convolutional Regression Network for Video Action Proposal
* TorontoCity: Seeing the World with a Million Eyes
* Toward Perceptually-Consistent Stereo: A Scanline Study
* Towards 3D Human Pose Estimation in the Wild: A Weakly-Supervised Approach
* Towards a Unified Compositional Model for Visual Pattern Modeling
* Towards a Visual Privacy Advisor: Understanding and Predicting Privacy Risks in Images
* Towards Context-Aware Interaction Recognition for Visual Relationship Detection
* Towards Diverse and Natural Image Descriptions via a Conditional GAN
* Towards End-to-End Text Spotting with Convolutional Recurrent Neural Networks
* Towards Large-Pose Face Frontalization in the Wild
* Towards More Accurate Iris Recognition Using Deeply Learned Spatially Corresponding Features
* Tracking as Online Decision-Making: Learning a Policy from Streaming Videos with Reinforcement Learning
* Tracking the Untrackable: Learning to Track Multiple Cues with Long-Term Dependencies
* Training Deep Networks to be Spatially Sensitive
* Transferring Objects: Joint Inference of Container and Human Pose
* Transformed Low-Rank Model for Line Pattern Noise Removal
* Transitive Invariance for Self-Supervised Visual Representation Learning
* Trespassing the Boundaries: Labeling Temporal Bounds for Object Interactions in Egocentric Video
* Truncating Wide Networks Using Binary Tree Architectures
* Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos
* TURN TAP: Temporal Unit Regression Network for Temporal Action Proposals
* Turning Corners into Cameras: Principles and Methods
* Two Stream Siamese Convolutional Neural Network for Person Re-identification, A
* Two-Phase Learning for Weakly Supervised Object Localization
* Two-Streamed Network for Estimating Fine-Scaled Depth Maps from Single RGB Images, A
* Understanding and Mapping Natural Beauty
* Understanding Low- and High-Level Contributions to Fixation Prediction
* Unified Deep Supervised Domain Adaptation and Generalization
* Unified Model for Near and Remote Sensing, A
* Universal Adversarial Perturbations Against Semantic Image Segmentation
* Unlabeled Samples Generated by GAN Improve the Person Re-identification Baseline in Vitro
* Unmasking the Abnormal Events in Video
* Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks
* Unrestricted Facial Geometry Reconstruction Using Image-to-Image Translation
* Unrolled Memory Inner-Products: An Abstract GPU Operator for Efficient Vision-Related Computations
* Unsupervised Action Discovery and Localization in Videos
* Unsupervised Adaptation for Deep Stereo
* Unsupervised Creation of Parameterized Avatars
* Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos
* Unsupervised Learning from Video to Detect Foreground Objects in Single Images
* Unsupervised Learning of Important Objects from First-Person Videos
* Unsupervised Learning of Object Landmarks by Factorized Spatial Embeddings
* Unsupervised Learning of Stereo Matching
* Unsupervised Object Segmentation in Video by Efficient Selection of Highly Probable Positive Features
* Unsupervised Representation Learning by Sorting Sequences
* Unsupervised Video Understanding by Reconciliation of Posture Similarities
* Using Sparse Elimination for Solving Minimal Problems in Computer Vision
* VegFru: A Domain-Specific Dataset for Fine-Grained Visual Categorization
* Video Deblurring via Semantic Segmentation and Pixel-Wise Non-linear Kernel
* Video Fill In the Blank Using LR/RL LSTMs with Spatial-Temporal Attentions
* Video Frame Interpolation via Adaptive Separable Convolution
* Video Frame Synthesis Using Deep Voxel Flow
* Video Reflection Removal Through Spatio-Temporal Optimization
* Video Scene Parsing with Predictive Feature Learning
* View Adaptive Recurrent Neural Networks for High Performance Human Action Recognition from Skeleton Data
* Visual Forecasting by Imitating Dynamics in Natural Sequences
* Visual Odometry for Pixel Processor Arrays
* Visual Relationship Detection with Internal and External Linguistic Knowledge Distillation
* Visual Semantic Planning Using Deep Successor Representations
* Visual Transformation Aided Contrastive Learning for Video-Based Kinship Verification
* Volumetric Flow Estimation for Incompressible Fluids Using the Stationary Stokes Equations
* VPGNet: Vanishing Point Guided Network for Lane and Road Marking Detection and Recognition
* VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation
* Wavelet-SRNet: A Wavelet-Based CNN for Multi-scale Face Super Resolution
* Weakly Supervised Learning of Deep Metrics for Stereo Reconstruction
* Weakly Supervised Manifold Learning for Dense Semantic Object Correspondence
* Weakly Supervised Object Localization Using Things and Stuff Transfer
* Weakly Supervised Summarization of Web Videos
* Weakly- and Self-Supervised Learning for Content-Aware Deep Image Retargeting
* Weakly-Supervised Learning of Visual Relations
* WeText: Scene Text Detection under Weak Supervision
* What Actions are Needed for Understanding Human Actions in Videos?
* What is Around the Camera?
* What will Happen Next? Forecasting Player Moves in Sports Videos
* When Unsupervised Domain Adaptation Meets Tensor Representations
* WordSup: Exploiting Word Annotations for Character Based Text Detection
* Zero-Order Reverse Filtering
* 3C-Net: Category Count and Center Loss for Weakly-Supervised Action Localization
* 3D Face Modeling From Diverse Raw Scan Data
* 3D Instance Segmentation via Multi-Task Metric Learning
* 3D Point Cloud Generative Adversarial Network Based on Tree Structured Graph Convolutions
* 3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera
* 3D Scene Reconstruction With Multi-Layer Depth and Epipolar Transformers
* 3D-LaneNet: End-to-End 3D Multiple Lane Detection
* 3D-RelNet: Joint Object and Relational Network for 3D Prediction
* 3DPeople: Modeling the Geometry of Dressed Humans
* 6-DOF GraspNet: Variational Grasp Generation for Object Manipulation
* A2J: Anchor-to-Joint Regression Network for 3D Articulated Pose Estimation From a Single Depth Image
* ABD-Net: Attentive but Diverse Person Re-Identification
* Accelerate CNN via Recursive Bayesian Pruning
* Accelerate Learning of Deep Hashing With Gradient Attention
* Accelerated Gravitational Point Set Alignment With Altered Physical Laws
* Accurate Monocular 3D Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving
* ACE: Adapting to Changing Environments for Semantic Segmentation
* ACFNet: Attentional Class Feature Network for Semantic Segmentation
* ACMM: Aligned Cross-Modal Memory for Few-Shot Image and Sentence Matching
* ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks
* Action Assessment by Joint Relation Graphs
* Action Recognition With Spatial-Temporal Discriminative Filter Banks
* Active Learning for Deep Detection Neural Networks
* Adaptative Inference Cost With Convolutional Neural Mixture Models
* AdaptIS: Adaptive Instance Selection Network
* Adaptive Activation Thresholding: Dynamic Routing Type Behavior for Interpretability in Convolutional Neural Networks
* Adaptive Context Network for Scene Parsing
* Adaptive Density Map Generation for Crowd Counting
* Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding
* Adaptive Wing Loss for Robust Face Alignment via Heatmap Regression
* AdaTransform: Adaptive Data Transformation
* Addressing Model Vulnerability to Distributional Shifts Over Image Transformation Sets
* Adversarial Defense by Restricting the Hidden Space of Deep Neural Networks
* Adversarial Defense via Learning to Generate Diverse Attacks
* Adversarial Feedback Loop
* Adversarial Fine-Grained Composition Learning for Unseen Attribute-Object Recognition
* Adversarial Learning With Margin-Based Triplet Embedding Regularization
* Adversarial Representation Learning for Text-to-Image Matching
* Adversarial Robustness vs. Model Compression, or Both?
* AdvIT: Adversarial Frames Identifier Based on Temporal Consistency in Videos
* advPattern: Physical-World Attacks on Deep Person Re-Identification via Adversarially Transformable Patterns
* AFD-Net: Aggregated Feature Difference Learning for Cross-Spectral Image Patch Matching
* Aggregation via Separation: Boosting Facial Landmark Detector With Semi-Supervised Style Translation
* Agile Depth Sensing Using Triangulation Light Curtains
* AGSS-VOS: Attention Guided Single-Shot Video Object Segmentation
* Alarm System for Segmentation Algorithm Based on Shape Model, An
* Algebraic Characterization of Essential Matrices and Their Averaging in Multiview Settings
* Align, Attend and Locate: Chest X-Ray Diagnosis via Contrast Induced Attention Network With Limited Supervision
* Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment
* Aligning Latent Spaces for 3D Hand Pose Estimation
* AM-LFS: AutoML for Loss Function Search
* AMASS: Archive of Motion Capture As Surface Shapes
* AMP: Adaptive Masked Proxies for Few-Shot Segmentation
* Analyzing the Variety Loss in the Context of Probabilistic Trajectory Prediction
* Anchor Diffusion for Unsupervised Video Object Segmentation
* Anchor Loss: Modulating Loss Scale Based on Prediction Difficulty
* Anomaly Detection in Video Sequence With Appearance-Motion Correspondence
* Approximated Bilinear Modules for Temporal Modeling
* ARGAN: Attentive Recurrent Generative Adversarial Network for Shadow Detection and Removal
* Asymmetric Cross-Guided Attention Network for Actor and Action Video Segmentation From Natural Language Query
* Asymmetric Non-Local Neural Networks for Semantic Segmentation
* Asynchronous Single-Photon 3D Imaging
* Attacking Optical Flow
* Attention Augmented Convolutional Networks
* Attention Bridging Network for Knowledge Transfer
* Attention on Attention for Image Captioning
* Attention-Aware Polarity Sensitive Embedding for Affective Image Retrieval
* Attention-Based Autism Spectrum Disorder Screening With Privileged Modality
* Attentional Feature-Pair Relation Networks for Accurate Face Recognition
* Attentional Neural Fields for Crowd Counting
* AttentionRNN: A Structured Spatial Attention Mechanism
* AttPool: Towards Hierarchical Feature Representation in Graph Convolutional Networks via Attention Mechanism
* Attract or Distract: Exploit the Margin of Open Set
* Attribute Attention for Semantic Disambiguation in Zero-Shot Learning
* Attribute Manipulation Generative Adversarial Networks for Fashion Images
* Attribute-Driven Spontaneous Motion in Unpaired Image Translation
* Attributing Fake Images to GANs: Learning and Analyzing GAN Fingerprints
* Auto-FPN: Automatic Network Architecture Adaptation for Object Detection Beyond Classification
* Auto-ReID: Searching for a Part-Aware ConvNet for Person Re-Identification
* AutoDispNet: Improving Disparity Estimation With AutoML
* AutoFocus: Efficient Multi-Scale Inference
* AutoGAN: Neural Architecture Search for Generative Adversarial Networks
* Automatic and Robust Skull Registration Based on Discrete Uniformization
* AVT: Unsupervised Learning of Transformation Equivariant Representations by Autoencoding Variational Transformations
* AWSD: Adaptive Weighted Spatiotemporal Distillation for Video Representation
* BAE-NET: Branched Autoencoder for Shape Co-Segmentation
* Balanced Datasets Are Not Enough: Estimating and Mitigating Gender Bias in Deep Image Representations
* Batch DropBlock Network for Person Re-Identification and Beyond
* Batch Weight for Domain Adaptation With Mass Shift
* Bayes-Factor-VAE: Hierarchical Bayesian Deep Auto-Encoder Models for Factor Disentanglement
* Bayesian Adaptive Superpixel Segmentation
* Bayesian Graph Convolution LSTM for Skeleton Based Action Recognition
* Bayesian Loss for Crowd Count Estimation With Point Supervision
* Bayesian Optimization Framework for Neural Network Compression, A
* Bayesian Optimized 1-Bit CNNs
* Bayesian Relational Memory for Semantic Visual Navigation
* Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation
* Better and Faster: Exponential Loss for Image Patch Matching
* Better to Follow, Follow to Be Better: Towards Precise Supervision of Feature Super-Resolution for Small Object Detection
* Beyond Cartesian Representations for Local Descriptors
* Beyond Human Parts: Dual Part-Aligned Representations for Person Re-Identification
* Bidirectional One-Shot Unsupervised Domain Mapping
* Bilateral Adversarial Training: Towards Fast Training of More Robust Models Against Adversarial Attacks
* Bilinear Attention Networks for Person Retrieval
* Bit-Flip Attack: Crushing Neural Network With Progressive Bit Search
* Block Annotation: Better Image Annotation With Sub-Image Decomposition
* BMN: Boundary-Matching Network for Temporal Action Proposal Generation
* Boosting Few-Shot Visual Learning With Self-Supervision
* Bottleneck Potentials in Markov Random Fields
* Boundary-Aware Feature Propagation for Scene Segmentation
* Boundless: Generative Adversarial Networks for Image Extension
* Bridging the Domain Gap for Ground-to-Aerial Image Matching
* Bridging the Gap Between Detection and Tracking: A Unified Approach
* Budget-Aware Adapters for Multi-Domain Learning
* C-MIDN: Coupled Multiple Instance Detection Network With Segmentation Guidance for Weakly Supervised Object Detection
* C3DPO: Canonical 3D Pose Networks for Non-Rigid Structure From Motion
* Calibration of Axial Fisheye Cameras Through Generic Virtual Central Models
* Calibration Wizard: A Guidance System for Camera Calibration Based on Modelling Geometric and Corner Uncertainty
* CAMEL: A Weakly Supervised Learning Framework for Histopathology Image Segmentation
* Camera Distance-Aware Top-Down Approach for 3D Multi-Person Pose Estimation From a Single RGB Image
* Camera That CNNs: Towards Embedded Neural Networks on Pixel Processor Arrays, A
* CamNet: Coarse-to-Fine Retrieval for Camera Re-Localization
* CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
* Canonical Surface Mapping via Geometric Cycle Consistency
* Cap2Det: Learning to Amplify Weak Caption Supervision for Object Detection
* CapsuleVOS: Semi-Supervised Video Object Segmentation Using Capsule Routing
* CARAFE: Content-Aware ReAssembly of FEatures
* Cascaded Context Pyramid for Full-Resolution 3D Semantic Scene Completion
* Cascaded Parallel Filtering for Memory-Efficient Image-Based Localization
* CCNet: Criss-Cross Attention for Semantic Segmentation
* CDPN: Coordinates-Based Disentangled Pose Network for Real-Time RGB-Based 6-DoF Object Pose Estimation
* CDTB: A Color and Depth Visual Object Tracking Dataset and Benchmark
* CenterNet: Keypoint Triplets for Object Detection
* CFSNet: Toward a Controllable Feature Space for Image Restoration
* Chinese Street View Text: Large-Scale Chinese Text Reading With Partially Supervised Learning
* CIIDefence: Defeating Adversarial Attacks by Fusing Class-Specific Image Inpainting and Image Denoising
* Closed-Form Optimal Two-View Triangulation Based on Angular Errors
* Closed-Form Solution to Universal Style Transfer, A
* ClothFlow: A Flow-Based Model for Clothed Person Generation
* Cluster Alignment With a Teacher for Unsupervised Domain Adaptation
* Clustered Object Detection in Aerial Images
* ClusterSLAM: A SLAM Backend for Simultaneous Rigid Body Clustering and Motion Estimation
* Co-Evolutionary Compression for Unpaired Image Translation
* Co-Mining: Deep Face Recognition With Noisy Labels
* Co-Segmentation Inspired Attention Networks for Video-Based Person Re-Identification
* Co-Separating Sounds of Visual Objects
* COCO-GAN: Generation by Parts via Conditional Coordinating
* Coherent Semantic Attention for Image Inpainting
* Collect and Select: Semantic Alignment Metric Learning for Few-Shot Learning
* Compact Trilinear Interaction for Visual Question Answering
* CompenNet++: End-to-End Full Projector Compensation
* CompoNet: Learning to Generate the Unseen by Part Synthesis and Composition
* Composite Shape Modeling via Latent Space Factorization
* Compositional Video Prediction
* Comprehensive Overhaul of Feature Distillation, A
* Computational Hyperspectral Imaging Based on Dimension-Discriminative Low-Rank Tensor Recovery
* Conditional Coupled Generative Adversarial Networks for Zero-Shot Domain Adaptation
* Conditional Recurrent Flow: Conditional Generation of Longitudinal Samples With Applications to Neuroimaging
* Confidence Regularized Self-Training
* Consensus Maximization Tree Search Revisited
* Conservative Wasserstein Training for Pose Estimation
* Constructing Self-Motivated Pyramid Curriculums for Cross-Domain Semantic Segmentation: A Non-Adversarial Approach
* Content and Style Disentanglement for Artistic Style Transfer
* Context-Aware Emotion Recognition Networks
* Context-Aware Feature and Label Fusion for Facial Action Unit Intensity Estimation With Partially Labeled Data
* Context-Aware Image Matting for Simultaneous Foreground and Alpha Estimation
* Contextual Attention for Hand Detection in the Wild
* Continual Learning by Asymmetric Loss Approximation With Single-Side Overestimation
* Controllable Artistic Text Style Transfer via Shape-Matching GAN
* Controllable Attention for Structured Layered Video Decomposition
* Controllable Video Captioning With POS Sequence Guidance Based on Gated Fusion Network
* Controlling Neural Networks via Energy Dissipation
* Convex Relaxations for Consensus and Non-Minimal Problems in 3D Vision
* Convex Shape Prior for Multi-Object Segmentation Using a Single Level Set Function
* Convolutional Approximations to the General Non-Line-of-Sight Imaging Operator
* Convolutional Character Networks
* Convolutional Sequence Generation for Skeleton-Based Action Synthesis
* Copy-and-Paste Networks for Deep Video Inpainting
* Correlation Congruence for Knowledge Distillation
* Cost-Aware Fine-Grained Recognition for IoTs Based on Sequential Fixations
* Counterfactual Critic Multi-Agent Training for Scene Graph Generation
* Counting With Focus for Free
* Creativity Inspired Zero-Shot Learning
* Cross View Fusion for 3D Human Pose Estimation
* Cross-Dataset Person Re-Identification via Unsupervised Pose Disentanglement and Adaptation
* Cross-Domain Adaptation for Animal Pose Estimation
* Cross-View Policy Learning for Street Navigation
* Cross-X Learning for Fine-Grained Visual Categorization
* Crowd Counting With Deep Structured Scale Integration Network
* Customizing Student Networks From Heterogeneous Teachers via Adaptive Knowledge Amalgamation
* CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features
* DADA: Depth-Aware Domain Adaptation in Semantic Segmentation
* DAGMapper: Learning to Map by Discovering Lane Topology
* DANet: Divergent Activation for Weakly Supervised Object Localization
* Data-Free Learning of Student Networks
* Data-Free Quantization Through Weight Equalization and Bias Correction
* Dataset of Multi-Illumination Images in the Wild, A
* DDSL: Deep Differentiable Simplex Layer for Learning Geometric Signals
* DeblurGAN-v2: Deblurring (Orders-of-Magnitude) Faster and Better
* DeCaFA: Deep Convolutional Cascade for Face Alignment in the Wild
* DeceptionNet: Network-Driven Domain Randomization
* Decoupled 3D Facial Shape Model by Adversarial Training, A
* Deep Appearance Maps
* Deep Blind Hyperspectral Image Fusion
* Deep CG2Real: Synthetic-to-Real Translation via Image Disentanglement
* Deep Closest Point: Learning Representations for Point Cloud Registration
* Deep Clustering by Gaussian Mixture Variational Autoencoders With Graph Embedding
* Deep Comprehensive Correlation Mining for Image Clustering
* Deep Constrained Dominant Sets for Person Re-Identification
* Deep Contextual Attention for Human-Object Interaction Detection
* Deep Cybersickness Predictor Based on Brain Signal Analysis for Virtual Reality Contents, A
* Deep Depth From Aberration Map
* Deep Elastic Networks With Model Selection for Multi-Task Learning
* Deep End-to-End Alignment and Refinement for Time-of-Flight RGB-D Module
* Deep Floor Plan Recognition Using a Multi-Task Network With Room-Boundary-Guided Attention
* Deep Graphical Feature Learning for the Feature Matching Problem
* Deep Head Pose Estimation Using Synthetic Images and Partial Adversarial Domain Adaption for Continuous Label Spaces
* Deep Hough Voting for 3D Object Detection in Point Clouds
* Deep Joint-Semantics Reconstructing Hashing for Large-Scale Unsupervised Cross-Modal Retrieval
* Deep Learning for Light Field Saliency Detection
* Deep Learning for Seeing Through Window With Raindrops
* Deep Mesh Reconstruction From Single RGB Images via Topology Modification Networks
* Deep Meta Functionals for Shape Representation
* Deep Meta Learning for Real-Time Target-Aware Visual Tracking
* Deep Meta Metric Learning
* Deep Metric Learning With Tuplet Margin Loss
* Deep Multi-Model Fusion for Single-Image Dehazing
* Deep Multiple-Attribute-Perceived Network for Real-World Texture Recognition
* Deep Non-Rigid Structure From Motion
* Deep Optics for Monocular Depth Estimation and 3D Object Detection
* Deep Parametric Indoor Lighting Estimation
* Deep Reinforcement Active Learning for Human-in-the-Loop Person Re-Identification
* Deep Residual Learning in the JPEG Transform Domain
* Deep Restoration of Vintage Photographs From Scanned Halftone Prints
* Deep Self-Learning From Noisy Labels
* Deep Single-Image Portrait Relighting
* Deep SR-ITM: Joint Learning of Super-Resolution and Inverse Tone-Mapping for 4K UHD HDR Applications
* Deep Step Pattern Representation for Multimodal Retinal Image Registration, A
* Deep Supervised Hashing With Anchor Graph
* Deep Tensor ADMM-Net for Snapshot Compressive Imaging
* DeepGCNs: Can GCNs Go As Deep As CNNs?
* DeepHuman: 3D Human Reconstruction From a Single Image
* DeepPruner: Learning Efficient Stereo Matching via Differentiable PatchMatch
* DeepVCP: An End-to-End Deep Neural Network for Point Cloud Registration
* Defending Against Universal Perturbations With Shared Adversarial Training
* Deformable Surface Tracking by Graph Matching
* Delay Metric for Video Object Detection: What Average Precision Fails to Tell, A
* Delving Deep Into Hybrid Annotations for 3D Human Recovery in the Wild
* Delving Into Robust Object Detection From Unmanned Aerial Vehicles: A Deep Nuisance Disentanglement Approach
* DensePoint: Learning Densely Contextual Representation for Efficient Point Cloud Processing
* DenseRaC: Joint 3D Pose and Shape Estimation by Dense Render-and-Compare
* Depth Completion From Sparse LiDAR Data With Depth-Normal Constraints
* Depth From Videos in the Wild: Unsupervised Monocular Depth Learning From Unknown Cameras
* Depth-Induced Multi-Scale Recurrent Attention Network for Saliency Detection
* Detecting 11K Classes: Large Scale Object Detection Without Fine-Grained Bounding Boxes
* Detecting Photoshopped Faces by Scripting Photoshop
* Detecting the Unexpected via Image Resynthesis
* Detecting Unseen Visual Relations Using Analogies
* DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks
* DF2Net: A Dense-Fine-Finer Network for Detailed 3D Face Reconstruction
* Differentiable Kernel Evolution
* Differentiable Learning-to-Group Channels via Groupable Convolutional Neural Networks
* Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks
* Differential Volumetric Approach to Multi-View Photometric Stereo, A
* Digging Into Self-Supervised Monocular Depth Estimation
* Dilated Convolutional Neural Networks for Sequential Manifold-Valued Data
* DiscoNet: Shapes Learning on Disconnected Manifolds for 3D Editing
* Discrete Laplace Operator Estimation for Dynamic 3D Reconstruction
* Discriminative Feature Learning With Consistent Attention Regularization for Person Re-Identification
* Discriminative Feature Transformation for Occluded Pedestrian Detection
* Discriminatively Learned Convex Models for Set Based Face Recognition
* Disentangled Image Matting
* Disentangling Monocular 3D Object Detection
* Disentangling Propagation and Generation for Video Prediction
* Distill Knowledge From NRSfM for Weakly Supervised 3D Pose Learning
* Distillation-Based Training for Multi-Exit Architectures
* Distilling Knowledge From a Deep Pose Regressor Network
* DistInit: Learning Video Representations Without a Single Labeled Video
* Diverse Image Synthesis From Semantic Layouts via Conditional IMLE
* Diversity With Cooperation: Ensemble Methods for Few-Shot Classification
* DMM-Net: Differentiable Mask-Matching Network for Video Object Segmentation
* Domain Adaptation for Semantic Segmentation With Maximum Squares Loss
* Domain Adaptation for Structured Output via Discriminative Patch Representations
* Domain Intersection and Domain Difference
* Domain Randomization and Pyramid Consistency: Simulation-to-Real Generalization Without Accessing Target Domain Data
* Domain-Adaptive Single-View 3D Reconstruction
* DPOD: 6D Pose Object Detector and Refiner
* Drive Act: A Multi-Modal Dataset for Fine-Grained Driver Behavior Recognition in Autonomous Vehicles
* Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks With Octave Convolution
* Drop to Adapt: Learning Discriminative Features for Unsupervised Domain Adaptation
* DSConv: Efficient Convolution Operator
* DSIC: Deep Stereo Image Compression
* Dual Adversarial Inference for Text-to-Image Synthesis
* Dual Attention Matching for Audio-Visual Event Localization
* Dual Directed Capsule Network for Very Low Resolution Image Recognition
* Dual Student: Breaking the Limits of the Teacher in Semi-Supervised Learning
* DUAL-GLOW: Conditional Flow-Based Generative Model for Modality Transfer
* Dual-Path Model With Adaptive Attention for Vehicle Re-Identification, A
* DUP-Net: Denoiser and Upsampler Network for 3D Adversarial Point Clouds Defense
* Dynamic Anchor Feature Selection for Single-Shot Object Detection
* Dynamic Context Correspondence Network for Semantic Alignment
* Dynamic Curriculum Learning for Imbalanced Data Classification
* Dynamic Graph Attention for Referring Expression Comprehension
* Dynamic Kernel Distillation for Efficient Pose Estimation in Videos
* Dynamic Multi-Scale Filters for Semantic Segmentation
* Dynamic PET Image Reconstruction Using Nonnegative Matrix Factorization Incorporated With Deep Image Prior
* Dynamic Points Agglomeration for Hierarchical Point Sets Learning
* Dynamic-Net: Tuning the Objective Without Re-Training for Synthesis Tasks
* DynamoNet: Dynamic Action and Motion Network
* Efficient and Accurate Arbitrary-Shaped Text Detection With Pixel Aggregation Network
* Efficient and Robust Registration on the 3D Special Euclidean Group
* Efficient Learning on Point Clouds With Basis Point Sets
* Efficient Segmentation: Learning Downsampling Near Semantic Boundaries
* Efficient Solution to the Homography-Based Relative Pose Problem With a Common Reference Direction, An
* EGNet: Edge Guidance Network for Salient Object Detection
* Ego-Pose Estimation and Forecasting As Real-Time PD Control
* Elaborate Monocular Point and Line SLAM With Robust Initialization
* ELF: Embedded Localisation of Features in Pre-Trained CNN
* EM-Fusion: Dynamic Object-Level SLAM With Probabilistic Data Association
* Embedded Block Residual Network: A Recursive Restoration Model for Single-Image Super-Resolution
* Embodied Amodal Recognition: Learning to Move to Perceive Objects
* Empirical Study of Spatial Attention Mechanisms in Deep Networks, An
* Employing Deep Part-Object Relationships for Salient Object Detection
* EMPNet: Neural Localisation and Mapping Using Embedded Memory Points
* End-to-End CAD Model Retrieval and 9DoF Alignment in 3D Scans
* End-to-End Hand Mesh Recovery From a Monocular RGB Image
* End-to-End Learning for Graph Decomposition
* End-to-End Learning of Representations for Asynchronous Event-Based Data
* End-to-End Wireframe Parsing
* Enforcing Geometric Constraints of Virtual Normal for Depth Prediction
* Enhancing 2D Representation via Adjacent Views for 3D Shape Retrieval
* Enhancing Adversarial Example Transferability With an Intermediate Level Attack
* Enhancing Low Light Videos by Exploring High Sensitivity Camera Noise
* Enriched Feature Guided Refinement Network for Object Detection
* Entangled Transformer for Image Captioning
* EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition
* Episodic Training for Domain Generalization
* Equivariant Multi-View Networks
* ERL-Net: Entangled Representation Learning for Single Image De-Raining
* Escaping Plato's Cave: 3D Shape From Adversarial Rendering
* Estimating the Fundamental Matrix Without Point Correspondences With Application to Transmission Imaging
* EvalNorm: Estimating Batch Normalization Statistics for Evaluation
* Evaluating Robustness of Deep Image Super-Resolution Against Adversarial Attacks
* Event-Based Motion Segmentation by Motion Compensation
* Everybody Dance Now
* Evolving Space-Time Neural Architectures for Videos
* Expectation-Maximization Attention Networks for Semantic Segmentation
* Expert Sample Consensus Applied to Camera Re-Localization
* Explaining Neural Networks Semantically and Quantitatively
* Explaining the Ambiguity of Object Detection and 6D Pose From Visual Data
* Explicit Shape Encoding for Real-Time Instance Segmentation
* Exploiting Spatial-Temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks
* Exploiting Temporal Consistency for Real-Time Video Depth Estimation
* Exploring Overall Contextual Information for Image Captioning in Human-Like Cognitive Style
* Exploring Randomly Wired Neural Networks for Image Recognition
* Exploring the Limitations of Behavior Cloning for Autonomous Driving
* Extreme View Synthesis
* FAB: A Robust Facial Landmark Detection Framework for Motion-Blurred Videos
* Face Alignment With Kernel Density Deep Neural Network
* Face De-Occlusion Using 3D Morphable Model and Generative Adversarial Network
* Face Video Deblurring Using 3D Facial Priors
* Face-to-Parameter Translation for Game Character Auto-Creation
* FaceForensics++: Learning to Detect Manipulated Facial Images
* FACSIMILE: Fast and Accurate Scans From an Image in Less Than a Second
* Fair Loss: Margin-Aware Reinforcement Learning for Deep Face Recognition
* FAMNet: Joint Learning of Feature, Affinity and Multi-Dimensional Assignment for Online Multiple Object Tracking
* Fashion Retrieval via Graph Reasoning Networks on a Similarity Pyramid
* Fashion++: Minimal Edits for Outfit Improvement
* Fast and Accurate One-Stage Approach to Visual Grounding, A
* Fast and Practical Neural Architecture Search
* Fast Computation of Content-Sensitive Superpixels and Supervoxels Using Q-Distances
* Fast Image Restoration With Multi-Bin Trainable Linear Units
* Fast Object Detection in Compressed Video
* Fast Point R-CNN
* Fast Video Object Segmentation via Dynamic Targeting Network
* Fast-deepKCF Without Boundary Effect
* FCOS: Fully Convolutional One-Stage Object Detection
* FDA: Feature Disruptive Attack
* Feature Weighting and Boosting for Few-Shot Segmentation
* Few-Shot Adaptive Gaze Estimation
* Few-Shot Adversarial Learning of Realistic Neural Talking Head Models
* Few-Shot Generalization for Single-Image 3D Reconstruction via Priors
* Few-Shot Image Recognition With Knowledge Transfer
* Few-Shot Learning With Embedded Class Models and Shot-Free Meta Training
* Few-Shot Learning With Global Class Representations
* Few-Shot Object Detection via Feature Reweighting
* Few-Shot Unsupervised Image-to-Image Translation
* Fine-Grained Action Retrieval Through Multiple Parts-of-Speech Embeddings
* Fine-Grained Segmentation Networks: Self-Supervised Segmentation for Improved Long-Term Visual Localization
* FiNet: Compatible and Diverse Fashion Image Inpainting
* Fingerspelling Recognition in the Wild With Iterative Visual Attention
* Flare in Interference-Based Hyperspectral Cameras
* Floor-SP: Inverse CAD for Floorplans by Sequential Room-Wise Shortest Path
* Floorplan-Jigsaw: Jointly Estimating Scene Layout and Aligning Partial Scans
* Fooling Network Interpretation in Image Classification
* Foreground-Aware Pyramid Reconstruction for Alignment-Free Occluded Person Re-Identification
* ForkNet: Multi-Branch Volumetric Semantic Completion From a Single Depth Image
* Frame-to-Frame Aggregation of Active Regions in Web Videos for Weakly Supervised Semantic Segmentation
* FrameNet: Learning Local Canonical Frames of 3D Surfaces From a Single RGB Image
* Free-Form Image Inpainting With Gated Convolution
* Free-Form Video Inpainting With 3D Gated Convolution and Temporal PatchGAN
* FreiHAND: A Dataset for Markerless Capture of Hand Pose and Shape From Single RGB Images
* From Open Set to Closed Set: Counting Objects by Spatial Divide-and-Conquer
* From Strings to Things: Knowledge-Enabled VQA Model That Can Read and Reason
* FSGAN: Subject Agnostic Face Swapping and Reenactment
* Fully Convolutional Geometric Features
* Fully Convolutional Pixel Adaptive Image Denoiser
* FW-GAN: Flow-Navigated Warping GAN for Video Virtual Try-On
* G3raphGround: Graph-Based Language Grounding
* GA-DAN: Geometry-Aware Domain Adaptation Network for Scene Text Detection and Recognition
* GAN-Based Projector for Faster Recovery With Convergence Guarantees in Linear Inverse Problems
* GAN-Tree: An Incrementally Learned Hierarchical Generative Framework for Multi-Modal Data Distributions
* GANalyze: Toward Visual Definitions of Cognitive Image Properties
* GarNet: A Two-Stream Network for Fast and Accurate 3D Cloth Draping
* Gated-SCNN: Gated Shape CNNs for Semantic Segmentation
* Gated2Depth: Real-Time Dense Lidar From Gated Images
* Gaussian Affinity for Max-Margin Class Imbalanced Learning
* Gaussian YOLOv3: An Accurate and Fast Object Detector Using Localization Uncertainty for Autonomous Driving
* Gaze360: Physically Unconstrained Gaze Estimation in the Wild
* Generating Diverse and Descriptive Image Captions Using Visual Paraphrases
* Generating Easy-to-Understand Referring Expressions for Target Identifications
* Generative Adversarial Minority Oversampling
* Generative Adversarial Networks for Extreme Learned Image Compression
* Generative Adversarial Training for Weakly Supervised Cloud Matting
* Generative Modeling for Small-Data Object Detection
* Generative Multi-View Human Action Recognition
* GEOBIT: A Geodesic-Based Binary Descriptor Invariant to Non-Rigid Deformations for RGB-D Images
* Geometric Disentanglement for Generative Latent Shape Models
* Geometry Normalization Networks for Accurate Scene Text Detection
* Geometry-Inspired Decision-Based Attack, A
* GeoStyle: Discovering Fashion Trends and Events
* GLAMpoints: Greedily Learned Accurate Match Points
* Global Feature Guided Local Pooling
* Global-Local Temporal Representations for Video Person Re-Identification
* GLoSH: Global-Local Spherical Harmonics for Intrinsic Image Decomposition
* Goal-Driven Sequential Data Abstraction
* GODS: Generalized One-Class Discriminative Subspaces for Anomaly Detection
* GP2C: Geometric Projection Parameter Consensus for Joint 3D Pose and Focal Length Estimation in the Wild
* GradNet: Gradient-Guided Network for Visual Object Tracking
* Graph Convolutional Networks for Temporal Action Localization
* Graph-Based Framework to Bridge Movies and Synopses, A
* Graph-Based Object Classification for Neuromorphic Vision Sensing
* GraphX-Convolution for Point Cloud Deformation in 2D-to-3D Conversion
* Gravity as a Reference for Estimating a Person's Height From Video
* GridDehazeNet: Attention-Based Multi-Scale Network for Image Dehazing
* Ground-to-Aerial Image Geo-Localization With a Hard Exemplar Reweighting Triplet Loss
* Grounded Human-Object Interaction Hotspots From Video
* Group-Wise Deep Object Co-Segmentation With Co-Attention Recurrent Neural Network
* Grouped Spatial-Temporal Aggregation for Efficient Action Recognition
* GSLAM: A General SLAM Framework and Benchmark
* Guessing Smart: Biased Sampling for Efficient Black-Box Adversarial Attacks
* Guided Curriculum Model Adaptation and Uncertainty-Aware Evaluation for Semantic Nighttime Image Segmentation
* Guided Image-to-Image Translation With Bi-Directional Feature Transformation
* Guided Super-Resolution As Pixel-to-Pixel Transformation
* Habitat: A Platform for Embodied AI Research
* HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization
* Hallucinating IDT Descriptors and I3D Optical Flow Features for Action Recognition With CNNs
* HarDNet: A Low Memory Traffic Network
* HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision
* HBONet: Harmonious Bottleneck on Two Orthogonal Dimensions
* HEMlets Pose: Learning Part-Centric Heatmap Triplets for Accurate 3D Human Pose Estimation
* Hiding Video in Audio via Reversible Generative Models
* Hierarchical Encoding of Sequential Data With Compact and Sub-Linear Storage Cost
* Hierarchical Point-Edge Interaction Network for Point Cloud Semantic Segmentation
* Hierarchical Self-Attention Network for Action Localization in Videos
* Hierarchical Shot Detector
* Hierarchy Parsing for Image Captioning
* Hilbert-Based Generative Defense for Adversarial Examples
* HiPPI: Higher-Order Projected Power Iterations for Scalable Multi-Matching
* HistoSegNet: Semantic Segmentation of Histological Tissue Type in Whole Slide Images
* Holistic++ Scene Understanding: Single-View 3D Holistic Scene Parsing and Human Pose Estimation With Human-Object Interaction and Physical Commonsense
* HoloGAN: Unsupervised Learning of 3D Representations From Natural Images
* Homography From Two Orientation- and Scale-Covariant Features
* How Do Neural Networks See Depth in Single Images?
* HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
* Human Attention in Image Captioning: Dataset and Analysis
* Human Mesh Recovery From Monocular Images via a Skeleton-Disentangled Representation
* Human Motion Prediction via Spatio-Temporal Inpainting
* Human Uncertainty Makes Classification More Robust
* Human-Aware Motion Deblurring
* Hyperpixel Flow: Semantic Correspondence With Multi-Layer Neural Features
* Hyperspectral Image Reconstruction Using Deep External and Internal Learning
* Identity From Here, Pose From There: Self-Supervised Disentanglement and Generation of Objects Using Unlabeled Videos
* IL2M: Class Incremental Learning With Dual Memory
* Image Aesthetic Assessment Based on Pairwise Comparison ­ A Unified Approach to Score Regression, Binary Classification, and Personalization
* Image Generation From Small Datasets via Batch Statistics Adaptation
* Image Inpainting With Learnable Bidirectional Attention Maps
* Image Synthesis From Reconfigurable Layout and Style
* Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space?
* Imitation Learning for Human Pose Prediction
* IMP: Instance Mask Projection for High Accuracy Semantic Segmentation of Things
* Implicit Surface Representations As Layers in Neural Networks
* Improved Conditional VRNNs for Video Prediction
* Improved Techniques for Training Adaptive Deep Networks
* Improving Adversarial Robustness via Guided Complement Entropy
* Improving Pedestrian Attribute Recognition With Weakly-Supervised Multi-Scale Attribute-Specific Localization
* Incremental Class Discovery for Semantic Segmentation With RGBD Sensing
* Incremental Learning Using Conditional Adversarial Networks
* Indices Matter: Learning to Index for Deep Image Matting
* Information Entropy Based Feature Pooling for Convolutional Neural Networks
* InGAN: Capturing and Retargeting the DNA of a Natural Image
* InstaBoost: Boosting Instance Segmentation via Probability Map Guided Copy-Pasting
* Instance-Guided Context Rendering for Cross-Domain Person Re-Identification
* Instance-Level Future Motion Estimation in a Single Image Based on Ordinal Regression
* Integral Object Mining via Online Attention Accumulation
* Interactive Sketch Fill: Multiclass Sketch-to-Image Translation
* Internal Learning Approach to Video Inpainting, An
* Interpolated Convolutional Networks for 3D Point Cloud Understanding
* Invariant Information Clustering for Unsupervised Image Classification and Segmentation
* Is an Affine Constraint Needed for Affine Subspace Clustering?
* Is This the Right Place? Geometric-Semantic Pose Verification for Indoor Visual Localization
* Joint Acne Image Grading and Counting via Label Distribution Learning
* Joint Demosaicking and Denoising by Fine-Tuning of Bursts of Raw Images
* Joint Embedding of 3D Scan and CAD Objects
* Joint Group Feature Selection and Discriminative Filter Learning for Robust Visual Object Tracking
* Joint Learning of Saliency Detection and Weakly Supervised Semantic Segmentation
* Joint Learning of Semantic Alignment and Object Landmark Detection
* Joint Monocular 3D Vehicle Detection and Tracking
* Joint Optimization for Cooperative Image Captioning
* Joint Prediction for Kinematic Trajectories in Vehicle-Pedestrian-Mixed Scenes
* Joint Syntax Representation Learning and Visual Cue Translation for Video Captioning
* Jointly Aligning Millions of Images With Deep Penalised Reconstruction Congealing
* JPEG Artifacts Reduction via Deep Convolutional Sparse Coding
* K-Best Transformation Synchronization
* Kernel Modeling Super-Resolution on Real Low-Resolution Images
* Key.Net: Keypoint Detection by Handcrafted and Learned CNN Filters
* Knowledge Distillation via Route Constrained Optimization
* KPConv: Flexible and Deformable Convolution for Point Clouds
* Label-PEnet: Sequential Label Propagation and Enhancement Networks for Weakly Supervised Instance Segmentation
* LADN: Local Adversarial Disentangling Network for Facial Makeup and De-Makeup
* lambda-Net: Reconstruct Hyperspectral Images from a Snapshot Measurement
* Language Features Matter: Effective Language Representations for Vision-Language Tasks
* Language-Agnostic Visual-Semantic Embeddings
* Language-Conditioned Graph Networks for Relational Reasoning
* LAP-Net: Level-Aware Progressive Network for Image Dehazing
* Laplace Landmark Localization
* Large-Scale Tag-Based Font Retrieval With Generative Feature Learning
* Larger Norm More Transferable: An Adaptive Feature Norm Approach for Unsupervised Domain Adaptation
* Layout-Induced Video Representation for Recognizing Agent-in-Place Actions
* LayoutVAE: Stochastic Scene Layout Generation From a Label Set
* Learn to Scale: Generating Multipolar Normalized Density Maps for Crowd Counting
* Learnable Triangulation of Human Pose
* Learned Representation for Scalable Vector Graphics, A
* Learned Video Compression
* Learning a Mixture of Granularity-Specific Experts for Fine-Grained Categorization
* Learning Aberrance Repressed Correlation Filters for Real-Time UAV Tracking
* Learning Across Tasks and Domains
* Learning an Effective Equivariant 3D Descriptor Without Supervision
* Learning an Event Sequence Embedding for Dense Event-Based Deep Stereo
* Learning Combinatorial Embedding Networks for Deep Graph Matching
* Learning Compositional Neural Information Fusion for Human Parsing
* Learning Compositional Representations for Few-Shot Recognition
* Learning Deep Priors for Image Dehazing
* Learning Discriminative Model Prediction for Tracking
* Learning Feature-to-Feature Translator by Alternating Back-Propagation for Generative Zero-Shot Learning
* Learning Filter Basis for Convolutional Neural Network Compression
* Learning Fixed Points in Generative Adversarial Networks: From Image-to-Image Translation to Disease Detection and Localization
* Learning Implicit Generative Models by Matching Perceptual Features
* Learning Joint 2D-3D Representations for Depth Completion
* Learning Lightweight Lane Detection CNNs by Self Attention Distillation
* Learning Local Descriptors With a CDF-Based Dynamic Soft Margin
* Learning Local RGB-to-CAD Correspondences for Object Pose Estimation
* Learning Meshes for Dense Visual SLAM
* Learning Motion in Feature Space: Locally-Consistent Deformable Convolution Networks for Fine-Grained Action Detection
* Learning Object-Specific Distance From a Monocular Image
* Learning Perspective Undistortion of Portraits
* Learning Propagation for Arbitrarily-Structured Data
* Learning Relationships for Multi-View 3D Object Recognition
* Learning Rich Features at High-Speed for Single-Shot Object Detection
* Learning Robust Facial Landmark Detection via Hierarchical Structured Ensemble
* Learning Semantic-Specific Graph Representation for Multi-Label Image Recognition
* Learning Shape Templates With Structured Implicit Functions
* Learning Similarity Conditions Without Explicit Supervision
* Learning Single Camera Depth Estimation Using Dual-Pixels
* Learning Spatial Awareness to Improve Crowd Counting
* Learning Temporal Action Proposals With Fewer Labels
* Learning the Model Update for Siamese Trackers
* Learning to Assemble Neural Module Tree Networks for Visual Grounding
* Learning to Caption Images Through a Lifetime by Asking Questions
* Learning to Collocate Neural Modules for Image Captioning
* Learning to Discover Novel Visual Categories via Deep Transfer Clustering
* Learning to Find Common Objects Across Few Image Collections
* Learning to Jointly Generate and Separate Reflections
* Learning to Paint With Model-Based Deep Reinforcement Learning
* Learning to Rank Proposals for Object Detection
* Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop
* Learning to Reconstruct 3D Manhattan Wireframes From a Single Image
* Learning to See Moving Objects in the Dark
* Learning Trajectory Dependencies for Human Motion Prediction
* Learning Two-View Correspondences and Geometry Using Order-Aware Network
* Learning With Average Precision: Training Image Retrieval With a Listwise Loss
* Learning With Unsure Data for Medical Image Diagnosis
* Leveraging Long-Range Temporal Relationships Between Proposals for Video Object Detection
* Lifelong GAN: Continual Learning for Conditional Image Generation
* Linearized Multi-Sampling for Differentiable Image Transformation
* Linearly Converging Quasi Branch and Bound Algorithms for Global Rigid Registration
* LIP: Local Importance-Based Pooling
* Liquid Warping GAN: A Unified Framework for Human Motion Imitation, Appearance Transfer and Novel View Synthesis
* Live Face De-Identification in Video
* Local Aggregation for Unsupervised Learning of Visual Embeddings
* Local Relation Networks for Image Recognition
* Local Supports Global: Deep Camera Relocalization With Sequence Enhancement
* Localization of Deep Inpainting Using High-Pass Fully Convolutional Network
* LogBarrier Adversarial Attack: Making Effective Use of Decision Boundary Information, The
* Looking to Relations for Future Trajectory Forecast
* LPD-Net: 3D Point Cloud Learning for Large-Scale Place Recognition and Environment Analysis
* M2FPA: A Multi-Yaw Multi-Pitch High-Quality Dataset and Benchmark for Facial Pose Analysis
* M3D-RPN: Monocular 3D Region Proposal Network for Object Detection
* Make a Face: Towards Arbitrary High Fidelity Face Manipulation
* Making History Matter: History-Advantage Sequence Training for Visual Dialog
* Making the Invisible Visible: Action Recognition Through Walls and Occlusions
* Many Task Learning With Task Routing
* Markerless Outdoor Human Motion Capture Using Multiple Autonomous Micro Aerial Vehicles
* Mask-Guided Attention Network for Occluded Pedestrian Detection
* Mask-ShadowGAN: Learning to Remove Shadows From Unpaired Data
* Maximum-Margin Hamming Hashing
* Memorizing Normality to Detect Anomaly: Memory-Augmented Deep Autoencoder for Unsupervised Anomaly Detection
* Memory-Based Neighbourhood Embedding for Visual Recognition
* Mesh R-CNN
* Meta R-CNN: Towards General Solver for Instance-Level Low-Shot Learning
* Meta-Learning to Detect Rare Objects
* Meta-Sim: Learning to Generate Synthetic Datasets
* MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning
* MeteorNet: Deep Learning on Dynamic 3D Point Cloud Sequences
* Metric Learning With HORDE: High-Order Regularizer for Deep Embeddings
* MIC: Mining Interclass Characteristics for Improved Metric Learning
* Micro-Baseline Structured Light
* Minimum Delay Object Detection From Video
* Miss Detection vs. False Alarm: Adversarial Learning for Small Object Segmentation in Infrared Images
* Mixed High-Order Attention Network for Person Re-Identification
* Mixture-Kernel Graph Attention Network for Situation Recognition
* MMAct: A Large-Scale Dataset for Cross Modal Human Action Understanding
* Modeling Inter and Intra-Class Relations in the Triplet Loss for Zero-Shot Learning
* Moment Matching for Multi-Source Domain Adaptation
* MONET: Multiview Semi-Supervised Keypoint Detection via Epipolar Divergence
* Mono-SF: Multi-View Geometry Meets Single-View Depth for Monocular Scene Flow Estimation of Dynamic Traffic Scenes
* Monocular 3D Human Pose Estimation by Generation and Ordinal Ranking
* Monocular Neural Image Based Rendering With Continuous View Control
* Monocular Piecewise Depth Estimation in Dynamic Scenes by Exploiting Superpixel Relations
* MonoLoco: Monocular 3D Pedestrian Localization and Uncertainty Estimation
* Mop Moiré Patterns Using MopNet
* Motion Guided Attention for Video Salient Object Detection
* Moulding Humans: Non-Parametric 3D Human Shape Estimation From Single Images
* Moving Indoor: Unsupervised Video Depth Learning in Challenging Environments
* Multi-Adversarial Faster-RCNN for Unrestricted Object Detection
* Multi-Agent Reinforcement Learning Based Frame Sampling for Effective Untrimmed Video Recognition
* Multi-Angle Point Cloud-VAE: Unsupervised Feature Learning for 3D Point Clouds From Multiple Angles by Joint Self-Reconstruction and Half-to-Half Prediction
* Multi-Class Part Parsing With Joint Boundary-Semantic Awareness
* Multi-Garment Net: Learning to Dress 3D People From Images
* Multi-Level Bottom-Top and Top-Bottom Feature Fusion for Crowd Counting
* Multi-Modality Latent Interaction Network for Visual Question Answering
* Multi-Stage Pathological Image Classification Using Semantic Segmentation
* Multi-View Image Fusion
* Multi-View Stereo by Temporal Nonparametric Fusion
* Multimodal Style Transfer via Graph Cuts
* Multinomial Distribution Learning for Effective Neural Architecture Search
* MultiSeg: Semantically Meaningful, Scale-Diverse Segmentations From Minimal User Input
* MVP Matching: A Maximum-Value Perfect Matching for Mining Hard Samples, With Application to Person Re-Identification
* MVSCRF: Learning Multi-View Stereo With Conditional Random Fields
* Neighborhood Preserving Hashing for Scalable Video Retrieval
* Neural 3D Morphable Models: Spiral Convolutional Networks for 3D Shape Representation Learning and Generation
* Neural Inter-Frame Compression for Video Coding
* Neural Inverse Rendering of an Indoor Scene From a Single Image
* Neural Network for Detailed Human Depth Estimation From a Single Image, A
* Neural Re-Simulation for Generating Bounces in Single Images
* Neural Turtle Graphics for Modeling City Road Layouts
* Neural-Guided RANSAC: Learning Where to Sample Model Hypotheses
* New Convex Relaxations for MRF Inference With Unknown Graphs
* NLNL: Negative Learning for Noisy Labels
* No Fear of the Dark: Image Retrieval Under Varying Illumination Conditions
* No-Frills Human-Object Interaction Detection: Factorization, Layout Encodings, and Training Techniques
* nocaps: novel object captioning at scale
* Noise Flow: Noise Modeling With Conditional Normalizing Flows
* Non-Local ConvLSTM for Video Compression Artifact Reduction
* Non-Local Intrinsic Decomposition With Near-Infrared Priors
* Non-Local Recurrent Neural Memory for Supervised Sequence Modeling
* Normalized Wasserstein for Mixture Distributions With Applications in Adversarial Learning and Domain Adaptation
* Not All Parts Are Created Equal: 3D Pose Estimation by Modeling Bi-Directional Dependencies of Body Parts
* NOTE-RCNN: NOise Tolerant Ensemble RCNN for Semi-Supervised Object Detection
* Novel Unsupervised Camera-Aware Domain Adaptation Framework for Person Re-Identification, A
* O2U-Net: A Simple Noisy Label Detection Approach for Deep Neural Networks
* Object Guided External Memory Network for Video Object Detection
* Object-Aware Instance Labeling for Weakly Supervised Object Detection
* Object-Driven Multi-Layer Scene Decomposition From a Single Image
* Objects365: A Large-Scale, High-Quality Dataset for Object Detection
* Occlusion Robust Face Recognition Based on Mask Learning With Pairwise Differential Siamese Network
* Occlusion-Aware Networks for 3D Human Pose Estimation in Video
* Occlusion-Shared and Feature-Separated Network for Occlusion Relationship Reasoning
* Occupancy Flow: 4D Reconstruction by Learning Particle Dynamics
* Omni-Scale Feature Learning for Person Re-Identification
* OmniMVS: End-to-End Learning for Omnidirectional Stereo Matching
* On Boosting Single-Frame 3D Human Pose Estimation via Monocular Videos
* On Network Design Spaces for Visual Recognition
* On the Design of Black-Box Adversarial Examples by Leveraging Gradient-Free Optimization and Operator Splitting Method
* On the Efficacy of Knowledge Distillation
* On the Global Optima of Kernelized Adversarial Representation Learning
* On the Over-Smoothing Problem of CNN Based Disparity Estimation
* Once a MAN: Towards Multi-Target Attack via Learning Multi-Target Adversarial Network Once
* One-Shot Neural Architecture Search via Self-Evaluated Template Network
* Onion-Peel Networks for Deep Video Completion
* Online Hyper-Parameter Learning for Auto-Augmentation Strategy
* Online Model Distillation for Efficient Video Inference
* Online Unsupervised Learning of the 3D Kinematic Structure of Arbitrary Rigid Bodies
* OperatorNet: Recovering 3D Shapes From Difference Operators
* Optimizing Network Structure for 3D Human Pose Estimation
* Optimizing the F-Measure for Threshold-Free Salient Object Detection
* Order-Aware Generative Modeling Using the 3D-Craft Dataset
* Order-Preserving Wasserstein Discriminant Analysis
* Orientation-Aware Semantic Segmentation on Icosahedron Spheres
* Overcoming Catastrophic Forgetting With Unlabeled Data in the Wild
* P-MVSNet: Learning Patch-Wise Matching Confidence Aggregation for Multi-View Stereo
* PAMTRI: Pose-Aware Multi-Task Learning for Vehicle Re-Identification Using Highly Randomized Synthetic Data
* PANet: Few-Shot Image Semantic Segmentation With Prototype Alignment
* Parametric Majorization for Data-Driven Energy Minimization Methods
* Pareto Meets Huber: Efficiently Avoiding Poor Minima in Robust Estimation
* PARN: Position-Aware Relation Networks for Few-Shot Learning
* Patchwork: A Patch-Wise Attention Network for Efficient Object Detection and Segmentation in Video Streams
* Perceptual Deep Depth Super-Resolution
* Permutation-Invariant Feature Restructuring for Correlation-Aware Image Set-Based Recognition
* Person Search by Text Attribute Query As Zero-Shot Learning
* Person-in-WiFi: Fine-Grained Person Perception Using WiFi
* Personalized Fashion Design
* Perspective-Guided Convolution Networks for Crowd Counting
* Photo-Realistic Facial Details Synthesis From Single Image
* Photo-Realistic Monocular Gaze Redirection Using Generative Adversarial Networks
* Photorealistic Style Transfer via Wavelet Transforms
* Phrase Localization Without Paired Training Examples
* Physical Adversarial Textures That Fool Visual Object Tracking
* Physics-Based Rendering for Improving Robustness to Rain
* PIE: A Large-Scale Dataset and Models for Pedestrian Intention Estimation and Trajectory Prediction
* PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization
* Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation
* Pix2Vox: Context-Aware 3D Reconstruction From Single and Multi-View Images
* Pixel2Mesh++: Multi-View 3D Mesh Generation via Deformation
* PLMP: Point-Line Minimal Problems in Complete Multi-View Visibility
* POD: Practical Object Detection With Scale-Sensitive Network
* Point-Based Multi-View Stereo Network
* Point-to-Point Video Generation
* PointAE: Point Auto-Encoder for 3D Statistical Shape and Texture Modelling
* PointCloud Saliency Maps
* PointFlow: 3D Point Cloud Generation With Continuous Normalizing Flows
* Polarimetric Relative Pose Estimation
* Pose-Aware Multi-Level Feature Network for Human Object Interaction Detection
* Pose-Guided Feature Alignment for Occluded Person Re-Identification
* PR Product: A Substitute for Inner Product in Neural Networks
* PRECOG: PREdiction Conditioned on Goals in Visual Multi-Agent Settings
* Predicting 3D Human Dynamics From Video
* Predicting the Future: A Jointly Learnt Model for Action Anticipation
* Presence-Only Geographical Priors for Fine-Grained Image Classification
* Prior Guided Dropout for Robust Visual Localization in Dynamic Environments
* Prior-Aware Neural Network for Partially-Supervised Multi-Organ Segmentation
* Privacy Preserving Image Queries for Camera Localization
* Pro-Cam SSfM: Projector-Camera System for Structure and Spectral Reflectance From Motion
* Probabilistic Deep Ordinal Regression Based on Gaussian Processes
* Probabilistic Face Embeddings
* Program-Guided Image Manipulators
* Progressive Differentiable Architecture Search: Bridging the Depth Gap Between Search and Evaluation
* Progressive Fusion Video Super-Resolution Network via Exploiting Non-Local Spatio-Temporal Correlations
* Progressive Reconstruction of Visual Structure for Image Inpainting
* Progressive Sparse Local Attention for Video Object Detection
* Progressive-X: Efficient, Anytime, Multi-Model Fitting Algorithm
* Proximal Mean-Field for Neural Network Quantization
* PU-GAN: A Point Cloud Upsampling Adversarial Network
* PuppetGAN: Cross-Domain Image Manipulation by Demonstration
* Pushing the Frontiers of Unconstrained Crowd Counting: New Dataset and Benchmark Method
* Pyramid Graph Networks With Connection Attentions for Region-Based One-Shot Semantic Segmentation
* QUARCH: A New Quasi-Affine Reconstruction Stratum From Vague Relative Camera Orientation Knowledge
* Quasi-Globally Optimal and Efficient Vanishing Point Estimation in Manhattan World
* Quaternion-Based Certifiably Optimal Solution to the Wahba Problem With Outliers, A
* Racial Faces in the Wild: Reducing Racial Bias by Information Maximization Adaptation Network
* RainFlow: Optical Flow Under Rain Streaks and Rain Veiling Effect
* RANet: Ranking Attention Network for Fast Video Object Segmentation
* RankSRGAN: Generative Adversarial Networks With Ranker for Image Super-Resolution
* Re-ID Driven Localization Refinement for Person Search
* Real Image Denoising With Feature Attention
* Reasoning About Human-Object Interactions Through Dual Attention Networks
* Reciprocal Multi-Layer Subspace Learning for Multi-View Clustering
* Recognizing Part Attributes With Insufficient Data
* Recover and Identify: A Generative Dual Model for Cross-Resolution Person Re-Identification
* Recurrent U-Net for Resource-Constrained Segmentation
* Recursive Cascaded Networks for Unsupervised Medical Image Registration
* Recursive Visual Sound Separation Using Minus-Plus Net
* Reflective Decoding Network for Image Captioning
* Relation Distillation Networks for Video Object Detection
* Relation Parsing Neural Network for Human-Object Interaction Detection
* Relation-Aware Graph Attention Network for Visual Question Answering
* Relational Attention Network for Crowd Counting
* RelGAN: Multi-Domain Image-to-Image Translation via Relative Attributes
* Remote Heart Rate Measurement From Highly Compressed Facial Videos: An End-to-End Deep Learning Solution With Video Enhancement
* RepPoints: Point Set Representation for Object Detection
* Rescan: Inductive Instance Segmentation for Indoor RGBD Scans
* Resolving 3D Human Pose Ambiguities With 3D Scene Constraints
* Resource Constrained Neural Network Architecture Search: Will a Submodularity Assumption Help?
* Restoration of Non-Rigidly Distorted Underwater Images Using a Combination of Compressive Sensing and Local Polynomial Image Representations
* Rethinking ImageNet Pre-Training
* Rethinking Zero-Shot Learning: A Conditional Visual Classification Perspective
* Revisiting Point Cloud Classification: A New Benchmark Dataset and Classification Model on Real-World Data
* Revisiting Radial Distortion Absolute Pose
* RGB-Infrared Cross-Modality Person Re-Identification via Joint Pixel and Feature Alignment
* RIO: 3D Object Instance Re-Localization in Changing Indoor Environments
* Robust Change Captioning
* Robust Learning Approach to Domain Adaptive Object Detection, A
* Robust Motion Segmentation From Pairwise Matches
* Robust Multi-Modality Multi-Object Tracking
* Robust Person Re-Identification by Modelling Feature Uncertainty
* Robust Variational Bayesian Point Set Registration
* S2GAN: Share Aging Factors Across Ages and Share Aging Trends Among Individuals
* S4L: Self-Supervised Semi-Supervised Learning
* Saliency-Guided Attention Network for Image-Sentence Matching
* Sampling Wisely: Deep Image Embedding by Top-K Precision Optimization
* Sampling-Free Epistemic Uncertainty Estimation Using Approximated Variance Propagation
* SANet: Scene Agnostic Network for Camera Localization
* SBSGAN: Suppression of Inter-Domain Background Shift for Person Re-Identification
* SC-FEGAN: Face Editing Generative Adversarial Network With User's Sketch and Color
* Scalable Place Recognition Under Appearance Change for Autonomous Driving
* Scalable Verified Training for Provably Robust Image Classification
* Scale-Aware Trident Networks for Object Detection
* Scaling and Benchmarking Self-Supervised Visual Representation Learning
* Scaling Object Detection by Transferring Classification Weights
* Scaling Recurrent Models via Orthogonal Approximations in Tensor Trains
* Scene Graph Prediction With Limited Labels
* Scene Text Visual Question Answering
* SceneGraphNet: Neural Message Passing for 3D Indoor Scene Augmentation
* Scoot: A Perceptual Metric for Facial Sketches
* SCRDet: Towards More Robust Detection for Small, Cluttered and Rotated Objects
* SCSampler: Sampling Salient Clips From Video for Efficient Action Recognition
* Searching for MobileNetV3
* Second-Order Non-Local Attention Networks for Person Re-Identification
* See-Through-Text Grouping for Referring Image Segmentation
* Seeing Motion in the Dark
* Seeing What a GAN Cannot Generate
* SegEQA: Video Segmentation Based Visual Attention for Embodied Question Answering
* SegSort: Segmentation by Discriminative Sorting of Segments
* Selective Sparse Sampling for Fine-Grained Image Recognition
* Selectivity or Invariance: Boundary-Aware Salient Object Detection
* Self-Critical Attention Learning for Person Re-Identification
* Self-Ensembling With GAN-Based Data Augmentation for Domain Adaptation in Semantic Segmentation
* Self-Guided Network for Fast Image Denoising
* Self-Similarity Grouping: A Simple Unsupervised Cross Domain Adaptation Approach for Person Re-Identification
* Self-Supervised Deep Depth Denoising
* Self-Supervised Difference Detection for Weakly-Supervised Semantic Segmentation
* Self-Supervised Learning With Geometric Constraints in Monocular Video: Connecting Flow, Depth, and Camera
* Self-Supervised Monocular Depth Hints
* Self-Supervised Moving Vehicle Tracking With Stereo Sound
* Self-Supervised Representation Learning From Multi-Domain Data
* Self-Supervised Representation Learning via Neighborhood-Relational Encoding
* Self-Training and Adversarial Background Regularization for Unsupervised Domain Adaptive One-Stage Object Detection
* Self-Training With Progressive Augmentation for Unsupervised Cross-Domain Person Re-Identification
* Semantic Adversarial Attacks: Parametric Transformations That Fool Deep Classifiers
* Semantic Part Detection via Matching: Learning to Generalize to Novel Viewpoints From Limited Training Data
* Semantic Stereo Matching With Pyramid Cost Volumes
* Semantic-Aware Knowledge Preservation for Zero-Shot Sketch-Based Image Retrieval
* Semantic-Transferable Weakly-Supervised Endoscopic Lesions Segmentation
* SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences
* Semantics-Enhanced Adversarial Nets for Text-to-Image Synthesis
* Semi-Supervised Domain Adaptation via Minimax Entropy
* Semi-Supervised Learning by Augmented Distribution Alignment
* Semi-Supervised Monocular 3D Face Reconstruction With End-to-End Shape-Preserved Domain Transfer
* Semi-Supervised Pedestrian Instance Synthesis and Detection With Mutual Reinforcement
* Semi-Supervised Skin Detection by Network With Mutual Guidance
* Semi-Supervised Video Salient Object Detection Using Pseudo-Labels
* SENSE: A Shared Encoder Network for Scene-Flow Estimation
* Seq-SG2SL: Inferring Semantic Layout From Scene Graph Through Sequence to Sequence Learning
* Sequence Level Semantics Aggregation for Video Object Detection
* Sequential Adversarial Learning for Self-Supervised Deep Visual Odometry
* Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning
* Shadow Removal via Shadow Image Decomposition
* Shape Reconstruction Using Differentiable Projections and Deep Priors
* Shape-Aware Human Pose and Shape Reconstruction Using Multi-View Images
* Shapeglot: Learning Language for Shape Differentiation
* ShapeMask: Learning to Segment Novel Objects by Refining Shape Priors
* Sharpen Focus: Learning With Attention Separability and Consistency
* ShellNet: Efficient Point Cloud Convolutional Neural Networks Using Concentric Shells Statistics
* Siamese Networks: The Tale of Two Manifolds
* SID4VAM: A Benchmark Dataset With Synthetic Images for Visual Attention Modeling
* Significance-Aware Information Bottleneck for Domain Adaptive Semantic Segmentation
* SILCO: Show a Few Images, Localize the Common Object
* Similarity-Preserving Knowledge Distillation
* Simultaneous Multi-View Instance Detection With Learned Geometric Soft-Constraints
* SinGAN: Learning a Generative Model From a Single Natural Image
* Single-Network Whole-Body Pose Estimation
* Single-Stage Multi-Person Pose Machines
* Situational Fusion of Visual Representation for Visual Navigation
* Skeleton-Aware 3D Human Shape Reconstruction From Point Clouds
* Skimming-Perusal Tracking: A Framework for Real-Time and Robust Long-Term Tracking
* SkyScapes: Fine-Grained Semantic Understanding of Aerial Scenes
* SlowFast Networks for Video Recognition
* Small Steps and Giant Leaps: Minimal Newton Solvers for Deep Learning
* SME-Net: Sparse Motion Estimation for Parametric Video Prediction Through Reinforcement Learning
* SO-HandNet: Self-Organizing Network for 3D Hand Pose Estimation With Semi-Supervised Learning
* Soft Rasterizer: A Differentiable Renderer for Image-Based 3D Reasoning
* SoftTriple Loss: Deep Metric Learning Without Triplet Sampling
* Solving Vision Problems via Filtering
* Sound of Motions, The
* SpaceNet MVOI: A Multi-View Overhead Imagery Dataset
* Sparse and Imperceivable Adversarial Attacks
* SparseMask: Differentiable Connectivity Learning for Dense Image Prediction
* Spatial Correspondence With Generative Adversarial Network: Learning Depth From Monocular Videos
* Spatial-Temporal Relation Networks for Multi-Object Tracking
* SpatialSense: An Adversarially Crowdsourced Benchmark for Spatial Relation Recognition
* Spatio-Temporal Filter Adaptive Network for Video Deblurring
* Spatio-Temporal Fusion Based Convolutional Sequence Learning for Lip Reading
* Spatiotemporal Feature Residual Propagation for Action Prediction
* Specifying Object Attributes and Relations in Interactive Scene Generation
* Spectral Feature Transformation for Person Re-Identification
* Spectral Regularization for Combating Mode Collapse in GANs
* SPGNet: Semantic Prediction Guidance for Scene Parsing
* SPLINE-Net: Sparse Photometric Stereo Through Lighting Interpolation and Normal Estimation Networks
* SplitNet: Sim2Sim and Task2Task Transfer for Embodied Visual Navigation
* SRM: A Style-Based Recalibration Module for Convolutional Neural Networks
* SROBB: Targeted Perceptual Loss for Single Image Super-Resolution
* SSAP: Single-Shot Instance Segmentation With Affinity Pyramid
* SSF-DAN: Separated Semantic Feature Based Domain Adaptation Network for Semantic Segmentation
* Stacked Cross Refinement Network for Edge-Aware Salient Object Detection
* StartNet: Online Detection of Action Start in Untrimmed Videos
* STD: Sparse-to-Dense 3D Object Detector for Point Cloud
* STGAT: Modeling Spatial-Temporal Interactions for Human Trajectory Prediction
* STM: SpatioTemporal and Motion Encoding for Action Recognition
* Stochastic Attraction-Repulsion Embedding for Large Scale Image Localization
* Stochastic Exposure Coding for Handling Multi-ToF-Camera Interference
* Stochastic Filter Groups for Multi-Task CNNs: Learning Specialist and Generalist Convolution Kernels
* Structured Modeling of Joint Deep Feature and Prediction Refinement for Salient Object Detection
* Structured Prediction Helps 3D Human Motion Modelling
* StructureFlow: Image Inpainting via Structure-Aware Appearance Flow
* Subspace Structure-Aware Spectral Clustering for Robust Subspace Clustering
* Surface Networks via General Covers
* Surface Normals and Shape From Water
* SVD: A Large-Scale Short Video Dataset for Near-Duplicate Video Retrieval
* Switchable Whitening for Deep Representation Learning
* Sym-Parameterized Dynamic Inference for Mixed-Domain Image Translation
* Symmetric Cross Entropy for Robust Learning With Noisy Labels
* Symmetric Graph Convolutional Autoencoder for Unsupervised Graph Representation Learning
* Symmetry-Constrained Rectification Network for Scene Text Recognition
* SynDeMo: Synergistic Deep Feature Alignment for Joint Learning of Depth and Ego-Motion
* Tag2Pix: Line Art Colorization Using Text Tag With SECat and Changing Loss
* Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded
* Talking With Hands 16.2M: A Large-Scale Dataset of Synchronized Body-Finger Motion and Audio for Conversational Motion Analysis and Synthesis
* TAPA-MVS: Textureless-Aware PAtchMatch Multi-View Stereo
* Targeted Mismatch Adversarial Attack: Query With a Flower to Retrieve the Tower
* TASED-Net: Temporally-Aggregating Spatial Encoder-Decoder Network for Video Saliency Detection
* Task-Driven Modular Networks for Zero-Shot Compositional Learning
* Task2Vec: Task Embedding for Meta-Learning
* Teacher Guided Architecture Search
* Teacher Supervises Students How to Learn From Partially Labeled Images for Facial Landmark Detection
* Tell, Draw, and Repeat: Generating and Modifying Images Based on Continual Linguistic Instruction
* Temporal Attentive Alignment for Large-Scale Video Domain Adaptation
* Temporal Knowledge Propagation for Image-to-Video Person Re-Identification
* Temporal Recurrent Networks for Online Action Detection
* Temporal Structure Mining for Weakly Supervised Action Detection
* TensorMask: A Foundation for Dense Object Segmentation
* Tex2Shape: Detailed Full Human Body Geometry From a Single Image
* TextDragon: An End-to-End Framework for Arbitrary Shaped Text Spotting
* TextPlace: Visual Place Recognition and Topological Localization Through Reading Scene Texts
* Texture Fields: Learning Texture Representations in Function Space
* TexturePose: Supervising Human Mesh Estimation With Texture Consistency
* Three-D Safari: Learning to Estimate Zebra Pose, Shape, and Texture From Images In the Wild
* Through-Wall Human Mesh Recovery Using Radio Signals
* ThunderNet: Towards Real-Time Generic Object Detection on Mobile Devices
* Topological Map Extraction From Overhead Images
* Total Denoising: Unsupervised Learning of 3D Point Cloud Cleaning
* Tour of Convolutional Networks Guided by Linear Interpreters, A
* Toward Real-World Single Image Super-Resolution: A New Benchmark and a New Model
* Towards Adversarially Robust Object Detection
* Towards Bridging Semantic Gap to Improve Semantic Segmentation
* Towards High-Resolution Salient Object Detection
* Towards Interpretable Face Recognition
* Towards Interpretable Object Detection by Unfolding Latent Structures
* Towards Latent Attribute Discovery From Triplet Similarities
* Towards Multi-Pose Guided Virtual Try-On Network
* Towards Photorealistic Reconstruction of Highly Multiplexed Lensless Images
* Towards Precise End-to-End Weakly Supervised Object Detection Network
* Towards Unconstrained End-to-End Text Spotting
* Towards Unsupervised Image Captioning With Shared Multimodal Embeddings
* Toyota Smarthome: Real-World Activities of Daily Living
* Tracking Without Bells and Whistles
* Trajectron: Probabilistic Multi-Agent Trajectory Modeling With Dynamic Spatiotemporal Graphs, The
* Transductive Episodic-Wise Adaptive Metric for Few-Shot Learning
* Transductive Learning for Zero-Shot Object Detection
* Transferability and Hardness of Supervised Classification Tasks
* Transferable Contrastive Network for Generalized Zero-Shot Learning
* Transferable Representation Learning in Vision-and-Language Navigation
* Transferable Semi-Supervised 3D Object Detection From RGB-D Data
* Transformable Bottleneck Networks
* TRB: A Novel Triplet Representation for Understanding 2D Human Body
* TSM: Temporal Shift Module for Efficient Video Understanding
* Two-Stream Action Recognition-Oriented Video Super-Resolution
* U-CAM: Visual Explanation Using Uncertainty Based Class Activation Maps
* U4D: Unsupervised 4D Dynamic Scene Understanding
* UM-Adapt: Unsupervised Multi-Task Adaptation Using Adversarial Cross-Task Distillation
* Uncertainty Modeling of Contextual-Connections Between Tracklets for Unconstrained Video-Based Face Recognition
* Uncertainty-Aware Audiovisual Activity Recognition Using Deep Bayesian Variational Inference
* Unconstrained Foreground Object Search
* Unconstrained Motion Deblurring for Dual-Lens Cameras
* Understanding Deep Networks via Extremal Perturbations and Smooth Masks
* Understanding Generalized Whitening and Coloring Transform for Universal Style Transfer
* Understanding Human Gaze Communication by Spatio-Temporal Graph Reasoning
* Universal Adversarial Perturbation via Prior Driven Uncertainty Approximation
* Universal Perturbation Attack Against Image Retrieval
* Universal Semi-Supervised Semantic Segmentation
* Universally Slimmable Networks and Improved Training Techniques
* Unpaired Image Captioning via Scene Graph Alignments
* Unpaired Image-to-Speech Synthesis With Multimodal Information Bottleneck
* Unsupervised 3D Reconstruction Networks
* Unsupervised Collaborative Learning of Keyframe Detection and Visual Odometry Towards Monocular Deep SLAM
* Unsupervised Deep Learning for Structured Shape Matching
* Unsupervised Domain Adaptation via Regularized Conditional Alignment
* Unsupervised Graph Association for Person Re-Identification
* Unsupervised High-Resolution Depth Learning From Videos With Dual Networks
* Unsupervised Learning of Landmarks by Descriptor Vector Exchange
* Unsupervised Microvascular Image Segmentation Using an Active Contours Mimicking Neural Network
* Unsupervised Multi-Task Feature Learning on Point Clouds
* Unsupervised Neural Quantization for Compressed-Domain Similarity Search
* Unsupervised Out-of-Distribution Detection by Maximum Classifier Discrepancy
* Unsupervised Person Re-Identification by Camera-Aware Similarity Consistency Learning
* Unsupervised Pre-Training of Image Features on Non-Curated Data
* Unsupervised Procedure Learning via Joint Dynamic Summarization
* Unsupervised Robust Disentangling of Latent Characteristics for Image Synthesis
* Unsupervised Video Interpolation Using Cycle Consistency
* UprightNet: Geometry-Aware Camera Orientation Estimation From Single Images
* USIP: Unsupervised Stable Interest Point Detection From 3D Point Clouds
* Variable Rate Deep Image Compression With a Conditional Autoencoder
* Variational Adversarial Active Learning
* Variational Few-Shot Learning
* Variational Uncalibrated Photometric Stereo Under General Lighting
* VaTeX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research
* Vehicle Re-Identification in Aerial Imagery: Dataset and Approach
* Vehicle Re-Identification with Viewpoint-Aware Metric Learning
* Very Long Natural Scenery Image Prediction by Outpainting
* ViCo: Word Embeddings From Visual Co-Occurrences
* Video Classification With Channel-Separated Convolutional Networks
* Video Compression With Rate-Distortion Autoencoders
* Video Face Clustering With Unknown Number of Clusters
* Video Instance Segmentation
* Video Object Segmentation Using Space-Time Memory Networks
* VideoBERT: A Joint Model for Video and Language Representation Learning
* VideoMem: Constructing, Analyzing, Predicting Short-Term and Long-Term Video Memorability
* View Confusion Feature Learning for Person Re-Identification
* View Independent Generative Adversarial Network for Novel View Synthesis
* View N-Gram Network for 3D Object Retrieval
* View-Consistent 4D Light Field Superpixel Segmentation
* View-LSTM: Novel-View Video Synthesis Through View Decomposition
* ViSiL: Fine-Grained Spatio-Temporal Video Similarity Learning
* Vision-Infused Deep Audio Inpainting
* Visual Deprojection: Probabilistic Recovery of Collapsed Dimensions
* Visual Semantic Reasoning for Image-Text Matching
* Visualization of Convolutional Neural Networks for Monocular Depth Estimation
* Visualizing the Invisible: Occluded Vehicle Segmentation and Recovery
* VrR-VG: Refocusing Visually-Relevant Relationships
* VTNFP: An Image-Based Virtual Try-On Network With Body and Clothing Feature Preservation
* VV-Net: Voxel VAE Net With Group Convolutions for Point Cloud Segmentation
* Wasserstein GAN With Quadratic Transport Cost
* Watch, Listen and Tell: Multi-Modal Weakly Supervised Dense Event Captioning
* Wavelet Domain Style Transfer for an Effective Perception-Distortion Tradeoff in Single Image Super-Resolution
* Weakly Aligned Cross-Modal Learning for Multispectral Pedestrian Detection
* Weakly Supervised Energy-Based Learning for Action Segmentation
* Weakly Supervised Fine Label Classifier Enhanced by Coarse Supervision, A
* Weakly Supervised Object Detection With Segmentation Collaboration
* Weakly Supervised Temporal Action Localization Through Contrast Based Evaluation Networks 1
* Weakly-Supervised Action Localization With Background Modeling
* What Else Can Fool Deep Learning? Addressing Color Constancy Errors on Deep Neural Network Performance
* What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis
* What Synthesis Is Missing: Depth Adaptation Integrated With Weak Supervision for Indoor Scene Parsing
* What Would You Expect? Anticipating Egocentric Actions With Rolling-Unrolling LSTMs and Modality Attention
* Where Is My Mirror?
* Why Does a Visual Question Have Different Answers?
* WoodScape: A Multi-Task, Multi-Camera Fisheye Dataset for Autonomous Driving
* WSOD2: Learning Bottom-Up and Top-Down Objectness Distillation for Weakly-Supervised Object Detection
* X-Section: Cross-Section Prediction for Enhanced RGB-D Fusion
* xR-EgoPose: Egocentric 3D Human Pose From an HMD Camera
* XRAI: Better Attributions Through Regions
* YOLACT: Real-Time Instance Segmentation
* Zero-Shot Anticipation for Instructional Activities
* Zero-Shot Emotion Recognition via Affective Structural Embedding
* Zero-Shot Grounding of Objects From Natural Language Queries
* Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks
1076 for ICCV19

* 3D Building Reconstruction from Monocular Remote Sensing Images
* 3D Human Pose Estimation with Spatial and Temporal Transformers
* 3D Human Texture Estimation from a Single Image with Transformers
* 3D Local Convolutional Neural Networks for Gait Recognition
* 3D Shape Generation and Completion through Point-Voxel Diffusion
* 3D-FRONT: 3D Furnished Rooms with layOuts and semaNTics
* 3DeepCT: Learning Volumetric Scattering Tomography of Clouds
* 3DIAS: 3D Shape Reconstruction with Implicit Algebraic Surfaces
* 3DStyleNet: Creating 3D Shapes with Geometric and Texture Style Variations
* 3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds
* 4D Cloud Scattering Tomography
* 4D-Net for Learned Multi-Modal Alignment
* 4DComplete: Non-Rigid Motion Estimation Beyond the Observable Surface
* A-SDF: Learning Disentangled Signed Distance Functions for Articulated Shape Representation
* AA-RMVSNet: Adaptive Aggregation Recurrent Multi-view Stereo Network
* ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning
* Accelerating Atmospheric Turbulence Simulation via Learned Phase-to-Space Transform
* ACDC: The Adverse Conditions Dataset with Correspondences for Semantic Driving Scene Understanding
* ACE: Ally Complementary Experts for Solving Long-Tailed Recognition in One-Shot
* Achieving on-Mobile Real-Time Super-Resolution with Neural Architecture and Pruning Search
* Act the Part: Learning Interaction Strategies for Articulated Object Part Discovery
* Action-Conditioned 3D Human Motion Synthesis with Transformer VAE
* Active Domain Adaptation via Clustering Uncertainty-weighted Embeddings
* Active Learning for Deep Object Detection via Probabilistic Modeling
* Active Learning for Lane Detection: A Knowledge Distillation Approach
* Active Universal Domain Adaptation
* AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis
* AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer
* AdaFit: Rethinking Learning-based Normal Estimation on Point Clouds
* AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition
* Adaptive Adversarial Network for Source-free Domain Adaptation
* Adaptive Boundary Proposal Network for Arbitrary Shape Text Detection
* Adaptive confidence thresholding for monocular depth estimation
* Adaptive Convolutions with Per-pixel Dynamic Filter Atom
* Adaptive Curriculum Learning
* Adaptive Focus for Efficient Video Recognition
* Adaptive Graph Convolution for Point Cloud Analysis
* Adaptive Hierarchical Graph Reasoning with Semantic Coherence for Video-and-Language Inference
* Adaptive Label Noise Cleaning with Meta-Supervision for Deep Face Recognition
* Adaptive Surface Normal Constraint for Depth Estimation
* Adaptive Surface Reconstruction with Multiscale Convolutional Kernels
* Adaptive Unfolding Total Variation Network for Low-Light Image Enhancement
* AdaSGN: Adapting Joint Number and Model Size for Efficient Skeleton-Based Action Recognition
* Admix: Enhancing the Transferability of Adversarial Attacks
* ADNet: Leveraging Error-Bias Towards Normal Direction in Face Alignment
* AdvDrop: Adversarial Attack to DNNs by Dropping Information
* Adversarial Attack on Deep Cross-Modal Hamming Retrieval
* Adversarial Attacks are Reversible with Natural Supervision
* Adversarial Attacks On Multi-Agent Communication
* Adversarial Example Detection Using Latent Neighborhood Graph
* Adversarial Robustness for Unsupervised Domain Adaptation
* Adversarial Unsupervised Domain Adaptation with Conditional and Label Shift: Infer, Align and Iterate
* Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA Models
* AdvRush: Searching for Adversarially Robust Neural Architectures
* AESOP: Abstract Encoding of Stories, Objects, and Pictures
* AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting
* Aggregation with Feature Detection
* AGKD-BML: Defense Against Adversarial Attack by Attention Guided Knowledge Distillation and Bi-directional Metric Learning
* Aha! Adaptive History-driven Attack for Decision-based Black-box Models
* AI Choreographer: Music Conditioned 3D Dance Generation with AIST++
* AINet: Association Implantation for Superpixel Segmentation
* Airbert: In-Domain Pretraining for Vision-and-Language Navigation
* ALADIN: All Layer Adaptive Instance Normalization for Fine-grained Style Similarity
* Aligning Latent and Image Spaces to Connect the Unconnectable
* Aligning Subtitles in Sign Language Videos
* ALL Snow Removed: Single Image Desnowing Algorithm Using Hierarchical Dual-tree Complex Wavelet Representation and Contradict Channel Loss
* Always Be Dreaming: A New Approach for Data-Free Class-Incremental Learning
* Amplitude-Phase Recombination: Rethinking Robustness of Convolutional Neural Networks in Frequency Domain
* Animatable Neural Radiance Fields for Modeling Dynamic Human Bodies
* Animation Transformer: Visual Correspondence via Segment Matching, The
* Anonymizing Egocentric Videos
* Anticipative Video Transformer
* ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators
* ARCH++: Animation-Ready Clothed Human Reconstruction Revisited
* Architecture Disentanglement for Deep Neural Networks
* Are we Missing Confidence in Pseudo-LiDAR Methods for Monocular 3D Object Detection?
* Artificial Fingerprinting for Generative Models: Rooting Deepfake Attribution in Training Data
* ASCNet: Self-Supervised Video Representation Learning with Appearance-Speed Consistency
* Ask amp;Confirm: Active Detail Enriching for Cross-Modal Retrieval with Partial Query
* ASMR: Learning Attribute-Based Person Search with Adaptive Semantic Margin Regularizer
* Assignment-Space-based Multi-Object Tracking and Segmentation
* Asymmetric Bilateral Motion Estimation for Video Frame Interpolation
* Asymmetric Loss For Multi-Label Classification
* Asynchronous Kalman Filter for Hybrid Event Cameras, An
* Attack as the Best Defense: Nullifying Image-to-image Translation GANs via Limit-aware Adversarial Attack
* Attack-Guided Perceptual Data Generation for Real-world Re-Identification
* Attention is not Enough: Mitigating the Distribution Discrepancy in Asynchronous Multimodal Sequence Fusion
* Attention-based Multi-Reference Learning for Image Super-Resolution
* Attentional Pyramid Pooling of Salient Visual Residuals for Place Recognition
* Attentive and Contrastive Learning for Joint Depth and Motion Field Estimation
* Audio-Visual Floorplan Reconstruction
* Audio2Gestures: Generating Diverse Gestures from Speech Audio with Conditional Variational Autoencoders
* Augmented Lagrangian Adversarial Attacks
* Augmenting Depth Estimation with Geospatial Context
* Auto Graph Encoder-Decoder for Neural Network Pruning
* Auto-Parsing Network for Image Captioning and Visual Question Answering
* AutoFormer: Searching Transformers for Visual Recognition
* AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection
* AutoSpace: Neural Architecture Search with Less Human Interference
* Auxiliary Tasks and Exploration Enable ObjectGoal Navigation
* BabelCalib: A Universal Approach to Calibrating Central Cameras
* Backdoor Attack against 3D Point Cloud Classifiers, A
* Baking Neural Radiance Fields for Real-Time View Synthesis
* BAPA-Net: Boundary Adaptation and Prototype Alignment for Cross-domain Semantic Segmentation
* BARF: Bundle-Adjusting Neural Radiance Fields
* Batch Normalization Increases Adversarial Vulnerability and Decreases Adversarial Transferability: A Non-Robust Feature Perspective
* Bayesian Deep Basis Fitting for Depth Completion with Uncertainty
* Bayesian Triplet Loss: Uncertainty Quantification in Image Retrieval
* Benchmark Platform for Ultra-Fine-Grained Visual Categorization Beyond Human Performance
* Benchmarking Ultra-High-Definition Image Super-resolution
* Benefit of Distraction: Denoising Camera-Based Physiological Measurements using Inverse Attention, The
* Better Aggregation in Test-Time Augmentation
* BEV-Net: Assessing Social Distancing Compliance by Joint People Localization and Geometric Reasoning
* Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering
* Beyond Road Extraction: A Dataset for Map Update using Aerial Images
* Beyond Trivial Counterfactual Explanations with Diverse Valuable Explanations
* Bias Loss for Mobile Neural Networks
* BiaSwap: Removing Dataset Bias with Bias-Tailored Swapping Augmentation
* Bifold and Semantic Reasoning for Pedestrian Behavior Prediction
* Big Self-Supervised Models Advance Medical Image Classification
* BiMaL: Bijective Maximum Likelihood Approach to Domain Adaptation in Semantic Scene Segmentation
* Binocular Mutual Learning for Improving Few-shot Classification
* BioFors: A Large Biomedical Image Forensics Dataset
* Bit-Mixer: Mixed-precision networks with runtime bit-width selection
* Black-box Detection of Backdoor Attacks with Limited Information and Data
* BlockCopy: High-Resolution Video Processing with Block-Sparse Feature Propagation and Online Policies
* BlockPlanner: City Block Generation with Vectorized Graph Representation
* BN-NAS: Neural Architecture Search with Batch Normalization
* Body-Face Joint Detection via Embedding and Head Hook
* Boosting Monocular Depth Estimation with Lightweight 3D Point Fusion
* Boosting the Generalization Capability in Cross-Domain Few-shot Learning via Noise-enhanced Supervised Autoencoder
* Boosting Weakly Supervised Object Detection via Learning Bounding Box Adjusters
* Bootstrap Your Own Correspondences
* BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search
* Boundary-sensitive Pre-training for Temporal Localization in Videos
* Box-Aware Feature Enhancement for Single Object Tracking on Point Clouds
* Bridging the Gap between Label- and Reference-based Synthesis in Multi-attribute Image-to-Image Translation
* Bridging Unsupervised and Supervised Depth from Focus via All-in-Focus Supervision
* Bringing Events into Video Deblurring with Non-consecutively Blurry Frames
* Broad Study on the Transferability of Visual Representations with Contrastive Learning, A
* Broaden Your Views for Self-Supervised Video Learning
* Building-GAN: Graph-Conditioned Architectural Volumetric Design Generation
* BuildingNet: Learning to Label 3D Buildings
* BV-Person: A Large-scale Dataset for Bird-view Person Re-identification
* C2N: Practical Generative Noise Modeling for Real-World Denoising
* C3-SemiSeg: Contrastive Semi-supervised Segmentation via Cross-set Learning and Dynamic Class-balancing
* CAG-QIL: Context-Aware Actionness Grouping via Q Imitation Learning for Online Temporal Action Localization
* Calibrated Adversarial Refinement for Stochastic Semantic Segmentation
* Calibrated and Partially Calibrated Semi-Generalized Homographies
* Calibrating Concepts and Operations: Towards Symbolic Reasoning on Real Images
* Camera Distortion-aware 3D Human Pose Estimation in Video with Optimization-based Meta-Learning
* Can Scale-Consistent Monocular Depth Be Learned in a Self-Supervised Scale-Invariant Manner?
* Can Shape Structure Features Improve Model Robustness under Diverse Adversarial Settings?
* CANet: A Context-Aware Network for Shadow Removal
* CanvasVAE: Learning to Generate Vector Graphic Documents
* CAPTRA: CAtegory-level Pose Tracking for Rigid and Articulated Objects from Point Clouds
* Cascade Image Matting with Deformable Graph Refinement
* CaT: Weakly Supervised Object Detection with Category Transfer
* Causal Attention for Unbiased Visual Recognition
* CCT-Net: Category-Invariant Cross-Domain Transfer for Medical Single-to-Multiple Disease Diagnosis
* CDNet: Centripetal Direction Network for Nuclear Instance Segmentation
* CDS: Cross-Domain Self-supervised Pre-training
* Center of Attention: Center-Keypoint Grouping via Attention for Multi-Person Pose Estimation, The
* Change is Everywhere: Single-Temporal Supervised Object Change Detection in Remote Sensing Imagery
* Channel Augmented Joint Learning for Visible-Infrared Recognition
* Channel-wise Knowledge Distillation for Dense Prediction*
* Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition
* Cherry-Picking Gradients: Learning Low-Rank Embeddings of Visual Data via Differentiable Cross-Approximation
* Class Semantics-based Attention for Action Detection
* Class-Incremental Learning for Action Recognition in Videos
* CLEAR: Clean-up Sample-Targeted Backdoor in Neural Networks
* Click to Move: Controlling Video Generation with Sparse Motion
* Closer Look at Rotation-invariant Deep Point Cloud Analysis, A
* Clothing Status Awareness for Long-Term Person Re-Identification
* Cloud Transformers: A Universal Approach To Point Cloud Processing Tasks
* Cluster-Promoting Quantization with Bit-Drop for Minimizing Network Quantization Loss
* Clustering by Maximizing Mutual Information Across Views
* CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification
* Co-Scale Conv-Attentional Image Transformers
* Co2L: Contrastive Continual Learning
* Coarsely-labeled Data for Better Few-shot Transfer
* CodeNeRF: Disentangled Neural Radiance Fields for Object Categories
* CODEs: Chamfer Out-of-Distribution Examples against Overconfidence Issue
* Collaborative and Adversarial Learning of Focused and Dispersive Representations for Semi-supervised Polyp Segmentation
* Collaborative Learning with Disentangled Features for Zero-shot Domain Adaptation
* Collaborative Optimization and Aggregation for Decentralized Domain Generalization and Adaptation
* Collaborative Unsupervised Visual Representation Learning from Decentralized Data
* Collaging Class-specific GANs for Semantic Image Synthesis
* CoMatch: Semi-supervised Learning with Contrastive Graph Regularization
* COMISR: Compression-Informed Video Super-Resolution
* Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction
* Complementary Patch for Weakly Supervised Semantic Segmentation
* Composable Augmentation Encoding for Video Representation Learning
* Compressing Visual-linguistic Model via Knowledge Distillation
* Concept Generalization in Visual Representation Learning
* Condensing a Sequence to One Informative Frame for Video Recognition
* Conditional DETR for Fast Training Convergence
* Conditional Diffusion for Interactive Segmentation
* Conditional Variational Capsule Network for Open Set Recognition
* CondLaneNet: a Top-to-down Lane Detection Framework Based on Conditional Convolution
* Confidence Calibration for Domain Generalization under Covariate Shift
* Confidence-based Iterative Solver of Depths and Surface Normals for Deep Multi-view Stereo, A
* Conformer: Local Features Coupling Global Representations for Visual Recognition
* Consistency-Aware Graph Network for Human Interaction Understanding
* Consistency-Sensitivity Guided Ensemble Black-Box Adversarial Attacks in Low-Dimensional Spaces
* Contact-Aware Retargeting of Skinned Motion
* Context Decoupling Augmentation for Weakly Supervised Semantic Segmentation
* Context Reasoning Attention Network for Image Super-Resolution
* Context-aware Scene Graph Generation with Seq2Seq Transformers
* Context-Sensitive Temporal Feature Learning for Gait Recognition
* Contextually Plausible and Diverse 3D Human Motion Prediction
* Continual Learning for Image-Based Camera Localization
* Continual Learning on Noisy Data Streams via Self-Purified Replay
* Continual Neural Mapping: Learning An Implicit Scene Representation from Sequential Observations
* Continual Prototype Evolution: Learning Online from Non-Stationary Data Streams
* Continuous Copy-Paste for One-stage Multi-object Tracking and Segmentation
* Contrast and Classify: Training Robust VQA Models
* Contrast and Order Representations for Video Self-Supervised Learning
* Contrasting Contrastive Self-Supervised Representation Learning Pipelines
* Contrastive Attention Maps for Self-supervised Co-localization
* Contrastive Coding for Active Learning under Class Distribution Mismatch
* Contrastive Learning for Label Efficient Semantic Segmentation
* Contrastive Learning of Image Representations with Cross-Video Cycle-Consistency
* Contrastive Multimodal Fusion with TupleInfoNCE
* COOKIE: Contrastive Cross-Modal Knowledge Sharing Pre-training for Vision-Language Representation
* Cortical Surface Shape Analysis Based on Alexandrov Polyhedra
* COTR: Correspondence Transformer for Matching Across Images
* Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification
* CPF: Learning a Contact Potential Field to Model the Hand-Object Interaction
* CPFN: Cascaded Primitive Fitting Networks for High-Resolution Point Clouds
* CR-Fill: Generative Image Inpainting with Auxiliary Contextual Reconstruction
* CrackFormer: Transformer Network for Fine-Grained Crack Detection
* Cross-Camera Convolutional Color Constancy
* Cross-category Video Highlight Detection via Set-based Learning
* Cross-Descriptor Visual Localization and Mapping
* Cross-Encoder for Unsupervised Gaze Representation Learning
* Cross-Modality Person Re-Identification via Modality Confusion and Center Aggregation
* Cross-Patch Graph Convolutional Network for Image Denoising
* Cross-Sentence Temporal and Semantic Relations in Video Activity Localisation
* CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations
* CrossDet: Crossline Representation for Object Detection
* CrossNorm and SelfNorm for Generalization under Distribution Shifts
* Crossover Learning for Fast Online Video Instance Segmentation
* CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification
* Crowd Counting With Partial Annotations in an Image
* CrowdDriven: A New Challenging Dataset for Outdoor Visual Localization
* CryoDRGN2: Ab initio neural reconstruction of 3D protein structures from real cryo-EM images
* CSG-Stump: A Learning Friendly CSG-Like Representation for Interpretable Shape Parsing
* CTRL-C: Camera calibration TRansformer with Line-Classification
* Curious Representation Learning for Embodied Intelligence
* Curvature Generation in Curved Spaces for Few-Shot Learning
* CvT: Introducing Convolutions to Vision Transformers
* D2-Net: Weakly-Supervised Action Localization via Discriminative Embeddings and Denoised Activations
* DAE-GAN: Dynamic Aspect-aware GAN for Text-to-Image Synthesis
* DAM: Discrepancy Alignment Metric for Face Recognition
* Dance with Self-Attention: A New Look of Conditional Random Fields on Anomaly Detection in Videos
* Dark Flash Normal Camera, A
* Data-free Universal Adversarial Perturbation and Black-box Attack
* DC-ShadowNet: Single-Image Hard and Soft Shadow Removal Using Unsupervised Domain-Classifier Guided Network
* DCT-SNN: Using DCT to Distribute Spatial Information over Time for Low-Latency Spiking Neural Networks
* De-rendering Stylized Texts
* DECA: Deep viewpoint-Equivariant human pose estimation using Capsule Autoencoders
* DecentLaM: Decentralized Momentum SGD for Large-batch Deep Training
* Deep 3D Mask Volume for View Synthesis of Dynamic Scenes
* Deep Blind Video Super-resolution
* Deep Co-Training with Task Decomposition for Semi-Supervised Domain Adaptation
* Deep Edge-Aware Interactive Colorization against Color-Bleeding Effects
* Deep Halftoning with Reversible Binary Pattern
* Deep Hough Voting for Robust Global Registration
* Deep Hybrid Self-Prior for Full 3D Mesh Generation
* Deep Implicit Surface Point Prediction Networks
* Deep Matching Prior: Test-Time Optimization for Dense Correspondence
* Deep Metric Learning for Open World Semantic Segmentation
* Deep Permutation Equivariant Structure from Motion
* Deep Relational Metric Learning
* Deep Reparametrization of Multi-Frame Super-Resolution and Denoising
* Deep Structured Instance Graph for Distilling Object Detectors
* Deep survival analysis with longitudinal X-rays for COVID-19
* Deep Symmetric Network for Underexposed Image Enhancement with Recurrent Attentional Learning
* Deep Transport Network for Unsupervised Video Object Segmentation
* Deep Virtual Markers for Articulated 3D Shapes
* DeepCAD: A Deep Generative Network for Computer-Aided Design Models
* DeepGaze IIE: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling
* DeepMultiCap: Performance Capture of Multiple Characters Using Sparse Multiview Cameras
* DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization
* DeepPRO: Deep Partial Point Cloud Registration of Objects
* DeePSD: Automatic Deep Skinning And Pose Space Deformation For 3D Garment Animation
* Defending against Universal Adversarial Patches by Clipping Feature Norms
* Defocus Map Estimation and Deblurring from a Single Dual-Pixel Image
* DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection
* Dense Deep Unfolding Network with 3D-CNN Prior for Snapshot Compressive Imaging
* Dense Interaction Learning for Video-based Person Re-identification
* Densely Guided Knowledge Distillation using Multiple Teacher Assistants
* DensePose 3D: Lifting Canonical Surface Maps of Articulated Objects to the Third Dimension
* DenseTNT: End-to-end Trajectory Prediction from Dense Goal Sets
* DepthInSpace: Exploitation and Fusion of Multiple Video Frames for Structured-Light Depth Estimation
* DepthTrack: Unveiling the Power of RGBD Tracking
* Describing and Localizing Multiple Changes with Transformers
* Designing a Practical Degradation Model for Deep Blind Image Super-Resolution
* Detail Me More: Improving GAN's photo-realism of complex scenes
* DetCo: Unsupervised Contrastive Learning for Object Detection
* Detecting Human-Object Relationships in Videos
* Detecting Invisible People
* Detecting Persuasive Atypicality by Modeling Contextual Compatibility
* Detection and Continual Learning of Novel Face Presentation Attacks
* Detector-Free Weakly Supervised Grounding by Separation
* Devil is in the Task: Exploiting Reciprocal Appearance-Localization Features for Monocular 3D Object Detection, The
* Diagonal Attention and Style-based GAN for Content-Style Disentanglement in Image Generation and Translation
* DiagViB-6: A Diagnostic Benchmark Suite for Vision Models in the Presence of Shortcut and Generalization Opportunities
* Differentiable Convolution Search for Point Cloud Processing
* Differentiable Dynamic Wirings for Neural Networks
* Differentiable Surface Rendering via Non-Differentiable Sampling
* Digging into Uncertainty in Self-supervised Multi-view Stereo
* Direct Differentiable Augmentation Search
* DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision
* Discover the Unknown Biased Attribute of an Image Classifier
* Discovering 3D Parts from Image Collections
* Discovering Human Interactions with Large-Vocabulary Objects via Query and Multi-Scale Detection
* Discriminative Region-based Multi-Label Zero-Shot Learning
* Disentangled High Quality Salient Object Detection
* Disentangled Lifespan Face Synthesis
* Disentangled Representation for Age-Invariant Face Recognition: A Mutual Information Minimization Perspective
* Dissecting Image Crops
* Distance-aware Quantization
* Distillation-guided Image Inpainting
* Distilling Global and Local Logits with Densely Connected Relations
* Distilling Holistic Knowledge with Graph Neural Networks
* Distilling Optimal Neural Networks: Rapid Search in Diverse Spaces
* Distilling Virtual Examples for Long-tailed Recognition
* Distinctiveness oriented Positional Equilibrium for Point Cloud Registration
* Distributional Robustness Loss for Long-tail Learning
* DisUnknown: Distilling Unknown Factors for Disentanglement Learning
* DivAug: Plug-in Automated Data Augmentation with Explicit Diversity Maximization
* Diverse Image Style Transfer via Invertible Cross-Space Mapping
* Divide and Conquer for Single-frame Temporal Action Localization
* Divide and Contrast: Self-supervised Learning from Uncurated Data
* Divide-and-Assemble: Learning Block-wise Memory for Unsupervised Anomaly Detection
* DnD: Dense Depth Estimation in Crowded Dynamic Indoor Scenes
* Do Different Deep Metric Learning Losses Lead to Similar Learned Features?
* Do Image Classifiers Generalize Across Time?
* DocFormer: End-to-End Transformer for Document Understanding
* DOLG: Single-Stage Image Retrieval with Deep Orthogonal Fusion of Local and Global Features
* Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation
* Domain Adaptive Video Segmentation via Temporal Consistency Regularization
* Domain Generalization via Gradient Surgery
* Domain-Aware Universal Style Transfer
* Domain-Invariant Disentangled Network for Generalizable Object Detection
* DRB-GAN: A Dynamic ResBlock Generative Adversarial Network for Artistic Style Transfer
* Dressing in Order: Recurrent Person Image Generation for Pose Transfer, Virtual Try-on and Outfit Editing
* DRINet: A Dual-Representation Iterative Learning Network for Point Cloud Segmentation
* DRIVE: Deep Reinforced Accident Anticipation with Visual Explanation
* DRÆM: A discriminatively trained reconstruction embedding for surface anomaly detection
* DTMNet: A Discrete Tchebichef Moments-based Deep Neural Network for Multi-focus Image Fusion
* Dual Bipartite Graph Learning: A General Approach for Domain Adaptive Object Detection
* Dual Contrastive Loss and Attention for GANs
* Dual Path Learning for Domain Adaptation of Semantic Segmentation
* Dual Projection Generative Adversarial Networks for Conditional Image Generation
* Dual Transfer Learning for Event-based End-task Prediction via Pluggable Event to Image Translation
* Dual-Camera Super-Resolution with Aligned Attention Modules
* DualPoseNet: Category-level 6D Object Pose and Size Estimation Using Dual Pose Network with Refined Learning of Pose Consistency
* DWKS: A Local Descriptor of Deformations Between Meshes and Point Clouds
* Dynamic Attentive Graph Learning for Image Restoration
* Dynamic Context-Sensitive Filtering Network for Video Salient Object Detection
* Dynamic Cross Feature Fusion for Remote Sensing Pansharpening
* Dynamic CT Reconstruction from Limited Views with Implicit Neural Representations and Parametric Motion Fields
* Dynamic DETR: End-to-End Object Detection with Dynamic Attention
* Dynamic Divide-and-Conquer Adversarial Training for Robust Semantic Segmentation
* Dynamic Dual Gating Neural Networks
* Dynamic High-Pass Filtering and Multi-Spectral Attention for Image Super-Resolution
* Dynamic Network Quantization for Efficient Video Inference
* Dynamic Surface Function Networks for Clothed Human Bodies
* Dynamic View Synthesis from Dynamic Monocular Video
* Dynamical Pose Estimation
* e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks
* EC-DARTS: Inducing Equalized and Consistent Optimization into DARTS
* ECACL: A Holistic Framework for Semi-Supervised Domain Adaptation
* ECS-Net: Improving Weakly Supervised Semantic Segmentation by Using Connections Between Class Activation Maps
* Editing Conditional Radiance Fields
* Effectively Leveraging Attributes for Visual Similarity
* Efficient Action Recognition via Dynamic Knowledge Propagation
* Efficient and Differentiable Shadow Computation for Inverse Problems
* Efficient Large Scale Inlier Voting for Geometric Vision Problems
* Efficient Video Compression via Content-Adaptive Super-Resolution
* Efficient Visual Pretraining with Contrastive Detection
* Egocentric Pose Estimation from Human Vision Span
* EgoRenderer: Rendering Human Avatars from Egocentric Camera Images
* EigenGAN: Layer-Wise Eigen-Learning for GANs
* Elaborative Rehearsal for Zero-shot Action Recognition
* Elastica Geodesic Approach with Convexity Shape Prior, An
* ELF-VC: Efficient Learned Flexible-Rate Video Coding
* ELLIPSDF: Joint Object Pose and Shape Optimization with a Bi-level Ellipsoid and Signed Distance Function Description
* ELSD: Efficient Line Segment Detector and Descriptor
* Else-Net: Elastic Semantic Network for Continual Action Recognition from Skeleton Data
* Embed Me If You Can: A Geometric Perceptron
* Embedding Novel Views in a Single JPEG Image
* Emerging Properties in Self-Supervised Vision Transformers
* Empirical Study of the Collapsing Problem in Semi-Supervised 2D Human Pose Estimation, An
* Empirical Study of Training Self-Supervised Vision Transformers, An
* Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation
* End-to-End Dense Video Captioning with Parallel Decoding
* End-to-End Detection and Pose Estimation of Two Interacting Hands
* End-to-end Piece-wise Unwarping of Document Images
* End-to-end robust joint unsupervised image alignment and clustering
* End-to-End Semi-Supervised Object Detection with Soft Teacher
* End-to-End Trainable Trident Person Search Network Using Adaptive Gradient Propagation
* End-to-End Transformer Model for 3D Object Detection, An
* End-to-End Unsupervised Document Image Blind Denoising
* End-to-End Urban Driving by Imitating a Reinforcement Learning Coach
* End-to-End Video Instance Segmentation via Spatial-Temporal Graph Neural Networks
* Energy-Based Open-World Uncertainty Modeling for Confidence Calibration
* Enhanced Boundary Learning for Glass-like Object Segmentation
* Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization
* Enriching Local and Global Contexts for Temporal Action Localization
* Ensemble Attention Distillation for Privacy-Preserving Federated Learning
* Entropy Maximization and Meta Classification for Out-of-Distribution Detection in Semantic Segmentation
* Env-QA: A Video Question Answering Benchmark for Comprehensive Understanding of Dynamic Environments
* Episodic Transformer for Vision-and-Language Navigation
* EPP-MVSNet: Epipolar-assembling based Depth Prediction for Multi-view Stereo
* Equivariant Imaging: Learning Beyond the Range Space
* Estimating and Exploiting the Aleatoric Uncertainty in Surface Normal Estimation
* Estimating Egocentric 3D Human Pose in Global Space
* Event Stream Super-Resolution via Spatiotemporal Constraint Learning
* Event-based Video Reconstruction Using Transformer
* Event-Intensity Stereo: Estimating Depth by the Best of Both Worlds
* EventHands: Real-Time Neural 3D Hand Pose Estimation from an Event Stream
* EventHPE: Event-based 3D Human Pose and Shape Estimation
* Evidential Deep Learning for Open Set Action Recognition
* EvIntSR-Net: Event Guided Multiple Latent Frames Reconstruction and Super-resolution
* Evolving Search Space for Neural Architecture Search
* Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation
* Explain Me the Painting: Multi-Topic Knowledgeable Art Description Generation
* Explainable Person Re-Identification with Attribute-guided Metric Distillation
* Explainable Video Entailment with Grounded Visual Evidence
* Explaining in Style: Training a GAN to explain a classifier in StyleSpace
* Explaining Local, Global, And Higher-Order Interactions In Deep Learning
* Explanations for Occluded Images
* Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation
* Exploiting Explanations for Model Inversion Attacks
* Exploiting Multi-Object Relationships for Detecting Adversarial Attacks in Complex Scenes
* Exploiting sample correlation for crowd counting with multi-expert network
* Exploiting Scene Graphs for Human-Object Interaction Detection
* Exploration and Estimation for Model Compression
* Exploring Classification Equilibrium in Long-Tailed Object Detection
* Exploring Cross-Image Pixel Contrast for Semantic Segmentation
* Exploring Geometry-aware Contrast and Clustering Harmonization for Self-supervised 3D Object Detection
* Exploring Inter-Channel Correlation for Diversity-preserved Knowledge Distillation
* Exploring Long Tail Visual Relationship Recognition with Large Vocabulary
* Exploring Relational Context for Multi-Task Dense Prediction
* Exploring Robustness of Unsupervised Domain Adaptation in Semantic Segmentation
* Exploring Simple 3D Multi-Object Tracking for Autonomous Driving
* Exploring Temporal Coherence for More General Video Face Forgery Detection
* Exploring Visual Engagement Signals for Representation Learning
* Extending Neural P-frame Codecs for B-frame Coding
* Extensions of Karger's Algorithm: Why They Fail in Theory and How They Are Useful in Practice
* Extreme Structure from Motion for Indoor Panoramas without Visual Overlaps
* Extreme-Quality Computational Imaging via Degradation Framework
* F-Drop &Match: GANs with a Dead Zone in the High-Frequency Domain
* Face Image Retrieval with Attribute Manipulation
* FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning
* Factorizing Perception and Policy for Interactive Instruction Following
* FairNAS: Rethinking Evaluation Fairness of Weight Sharing Neural Architecture Search
* Fake it till you make it: face analysis in the wild using synthetic data alone
* FaPN: Feature-aligned Pyramid Network for Dense Image Prediction
* FASA: Feature Augmentation and Sampling Adaptation for Long-Tailed Instance Segmentation
* FashionMirror: Co-attention Feature-remapping Virtual Try-on with Sequential Template Poses
* Fast and Efficient DNN Deployment via Deep Gaussian Transfer Learning
* Fast Convergence of DETR with Spatially Modulated Co-Attention
* Fast Light-field Disparity Estimation with Multi-disparity-scale Cost Aggregation
* Fast Video Moment Retrieval
* Faster Multi-Object Segmentation using Parallel Quadratic Pseudo-Boolean Optimization
* FastNeRF: High-Fidelity Neural Rendering at 200FPS
* FATNN: Fast and Accurate Ternary Neural Networks*
* FcaNet: Frequency Channel Attention Networks
* Feature Importance-aware Transferable Adversarial Attacks
* Feature Interactive Representation for Point Cloud Registration
* Federated Learning for Non-IID Data via Unified Feature Learning and Optimization Objective Alignment
* Few-Shot and Continual Learning with Attentive Independent Mechanisms
* Few-shot Image Classification: Just Use a Library of Pre-trained Feature Extractors and a Simple Classifier
* Few-Shot Semantic Segmentation with Cyclic Memory Network
* Few-Shot Visual Relationship Co-Localization
* FFT-OT: A Fast Algorithm for Optimal Transportation
* Field Convolutions for Surface CNNs
* Field of Junctions: Extracting Boundary Structure at Low SNR
* Field-Guide-Inspired Zero-Shot Learning
* FIERY: Future Instance Prediction in Bird's-Eye View from Surround Monocular Cameras
* Finding Representative Interpretations on Convolutional Neural Networks
* Fine-grained Semantics-aware Representation Enhancement for Self-supervised Monocular Depth Estimation
* FLAR: A Unified Prototype Framework for Few-sample Lifelong Active Recognition
* FloorPlanCAD: A Large-Scale CAD Drawing Dataset for Panoptic Symbol Spotting
* Flow-Guided Video Inpainting with Scene Templates
* FloW: A Dataset and Benchmark for Floating Waste Detection in Inland Waters
* FMODetect: Robust Detection of Fast Moving Objects
* Focal Frequency Loss for Image Reconstruction and Synthesis
* Focus on the Positives: Self-Supervised Learning for Biodiversity Monitoring
* Fog Simulation on Real LiDAR Point Clouds for 3D Object Detection in Adverse Weather
* Fooling LiDAR Perception via Adversarial Trajectory Perturbation
* Foreground Activation Maps for Weakly Supervised Object Localization
* Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization
* Fourier Space Losses for Efficient Perceptual Image Super-Resolution
* FOVEA: Foveated Image Magnification for Autonomous Navigation
* Free-form Description Guided 3D Visual Graph Network for Object Grounding in Point Cloud
* FREE: Feature Refinement for Generalized Zero-Shot Learning
* Frequency Domain Image Translation: More Photo-realistic, Better Identity-preserving
* Frequency-Aware Spatiotemporal Transformers for Video Inpainting Detection
* From Contexts to Locality: Ultra-high Resolution Image Segmentation via Locality-aware Contextual Correlation
* From Continuity to Editability: Inverting GANs with Consecutive Images
* From Culture to Clothing: Discovering the World Events Behind A Century of Fashion Images
* From General to Specific: Informative Scene Graph Generation via Balance Adjustment
* From Goals, Waypoints & Paths To Long Term Human Trajectory Forecasting
* From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network
* Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
* Full-Body Motion from a Single Head-Mounted Device: Generating SMPL Poses from Partial Observations
* Full-Duplex Strategy for Video Object Segmentation
* Full-Velocity Radar Returns by Radar-Camera Fusion
* Functional Correspondence Problem, The
* FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting
* Fusion Moves for Graph Matching
* G-DetKD: Towards General Distillation Framework for Object Detectors via Contrastive and Semantic-Guided Feature Imitation
* Gait Recognition in the Wild: A Benchmark
* Gait Recognition via Effective Global-Local Feature Representation and Local Temporal Aggregation
* GAN Inversion for Out-of-Range Images with Geometric Transformations
* GAN-Control: Explicitly Controllable GANs
* GANcraft: Unsupervised 3D Neural Rendering of Minecraft Worlds
* GarmentNets: Category-Level Pose Estimation for Garments via Canonical Space Shape Completion
* Gated3D: Monocular 3D Object Detection From Temporal Illumination Cues
* Gaussian Fusion: Accurate 3D Reconstruction via Geometry-Guided Displacement Interpolation
* GDP: Stabilized Neural Network Pruning via Gates with Differentiable Polarization
* General Recurrent Tracking Framework without Real Data, A
* Generalizable Mixed-Precision Quantization via Attribution Rank Preservation
* Generalize then Adapt: Source-Free Domain Adaptive Semantic Segmentation
* Generalized and Incremental Few-Shot Learning by Explicit Learning and Calibration without Forgetting
* Generalized Shuffled Linear Regression
* Generalized Source-free Domain Adaptation
* Generalizing Gaze Estimation with Outlier-guided Collaborative Adaptation
* Generating Attribution Maps with Disentangled Masked Backpropagation
* Generating Masks from Boxes by Mining Spatio-Temporal Consistencies in Videos
* Generating Smooth Pose Sequences for Diverse Human Motion Prediction
* Generative Adversarial Registration for Improved Conditional Deformable Templates
* Generative Compositional Augmentations for Scene Graph Prediction
* Generative Layout Modeling using Constraint Graphs
* Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers
* Generic Event Boundary Detection: A Benchmark for Event Segmentation
* Geography-Aware Self-Supervised Learning
* Geometric Deep Neural Network using Rigid and Non-Rigid Transformations for Human Action Recognition
* Geometric Granularity Aware Pixel-to-Mesh
* Geometric Unsupervised Domain Adaptation for Semantic Segmentation
* Geometry Uncertainty Projection Network for Monocular 3D Object Detection
* Geometry-Aware Self-Training for Unsupervised Domain Adaptation on Object Point Clouds
* Geometry-based Distance Decomposition for Monocular 3D Object Detection
* Geometry-Free View Synthesis: Transformers and no 3D Priors
* GeomNet: A Neural Network Based on Riemannian Geometries of SPD Matrix Space and Cholesky Space for 3D Skeleton-Based Interaction Recognition
* GistNet: a Geometric Structure Transfer Network for Long-Tailed Recognition
* Glimpse-Attend-and-Explore: Self-Attention for Active Visual Exploration
* GLiT: Neural Architecture Search for Global and Local Image Transformer
* Global Pooling, More than Meets the Eye: Position Information is Encoded Channel-Wise in CNNs
* Globally Optimal and Efficient Manhattan Frame Estimation by Delimiting Rotation Search Space
* GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-efficient Medical Image Recognition
* GNeRF: GAN-based Neural Radiance Field without Posed Camera
* Going deeper with Image Transformers
* GP-S3Net: Graph-based Panoptic Sparse Semantic Segmentation Network
* Gradient Distribution Alignment Certificates Better Adversarial Domain Adaptation
* Gradient Normalization for Generative Adversarial Networks
* Grafit: Learning fine-grained image representations with coarse labels
* Graph Constrained Data Representation Learning for Human Motion Segmentation
* Graph Contrastive Clustering
* Graph-BAS3Net: Boundary-Aware Semi-Supervised Segmentation Network with Bilateral Graph Convolution
* Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images
* Graph-based Asynchronous Event Processing for Rapid Object Recognition
* Graph-to-3D: End-to-End Generation and Manipulation of 3D Scenes Using Scene Graphs
* GraphFPN: Graph Feature Pyramid Network for Object Detection
* Graspness Discovery in Clutters for Fast and Accurate Grasp Detection
* Gravity-Aware Monocular 3D Human-Object Reconstruction
* Greedy Gradient Ensemble for Robust Visual Question Answering
* GRF: Learning a General Radiance Field for 3D Representation and Rendering
* GridToPix: Training Embodied Agents with Minimal Supervision
* Ground-Truth or DAER: Selective Re-Query of Secondary Information
* Grounding Consistency: Distilling Spatial Common Sense for Precise Visual Relationship Detection
* Group-aware Contrastive Regression for Action Quality Assessment
* Group-Free 3D Object Detection via Transformers
* Group-wise Inhibition based Feature Regularization for Robust Classification
* GroupFormer: Group Activity Recognition with Clustered Spatial-Temporal Transformer
* GTT-Net: Learned Generalized Trajectory Triangulation
* Guided Point Contrastive Learning for Semi-supervised Point Cloud Semantic Segmentation
* GyroFlow: Gyroscope-Guided Unsupervised Optical Flow Learning
* H2O: A Benchmark for Visual Human-human Object Handover Analysis
* H2O: Two Hands Manipulating Objects for First Person Interaction Recognition
* H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction
* HAA500: Human-Centric Atomic Action Dataset with Curated Videos
* HAIR: Hierarchical Visual-Semantic Relational Reasoning for Video Question Answering
* Hand Image Understanding via Deep Multi-Task Learning
* Hand-Object Contact Consistency Reasoning for Human Grasps Generation
* HandFoldingNet: A 3D Hand Pose Estimation Network Using Multiscale-Feature Guided Folding of a 2D Hand Skeleton
* Handwriting Transformers
* Harnessing the Conditioning Sensorium for Improved Image Translation
* HDR Video Reconstruction: A Coarse-to-fine Network and A Real-world Benchmark Dataset
* HeadGAN: One-shot Neural Head Synthesis and Editing
* Heterogeneous Relational Complement for Vehicle Re-identification
* Hierarchical Aggregation for 3D Instance Segmentation
* Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling
* Hierarchical Disentangled Representation Learning for Outdoor Illumination Estimation and Editing
* Hierarchical Graph Attention Network for Few-shot Visual-Semantic Learning
* Hierarchical Kinematic Probability Distributions for 3D Human Shape and Pose Estimation from Images in the Wild
* Hierarchical Memory Matching Network for Video Object Segmentation
* Hierarchical Object-to-Zone Graph for Object Navigation
* Hierarchical Transformation-Discriminating Generative Model for Few Shot Anomaly Detection, A
* Hierarchical Variational Neural Uncertainty Model for Stochastic Video Prediction, A
* HiFT: Hierarchical Feature Transformer for Aerial Tracking
* High Quality Disparity Remapping with Two-Stage Warping
* High-Fidelity Pluralistic Image Completion with Transformers
* High-Performance Discriminative Tracking with Transformers
* High-Resolution Optical Flow from 1D Attention and Correlation
* HighlightMe: Detecting Highlights from Human-Centric Videos
* HiNet: Deep Image Hiding by Invertible Network
* HIRE-SNN: Harnessing the Inherent Robustness of Energy-Efficient Deep Spiking Neural Networks by Training with Crafted Input Noise
* HiT: Hierarchical Transformer with Momentum Contrast for Video-Text Retrieval
* Holistic Pose Graph: Modeling Geometric Structure among Objects in a Scene using Graph Inference for 3D Object Prediction
* Homogeneous Architecture Augmentation for Neural Predictor
* How Shift Equivariance Impacts Metric Learning for Instance Segmentation
* How to Design a Three-Stage Architecture for Audio-Visual Active Speaker Detection in the Wild
* How to Train Neural Networks for Flare Removal
* HPNet: Deep Primitive Segmentation Using Hybrid Representations
* HRegNet: A Hierarchical Network for Large-scale Outdoor LiDAR Point Cloud Registration
* Human Detection and Segmentation via Multi-view Consensus
* Human Pose Regression with Residual Log-likelihood Estimation
* Human Trajectory Prediction via Counterfactual Analysis
* HuMoR: 3D Human Motion Model for Robust Pose Estimation
* Hybrid Frequency-Spatial Domain Model for Sparse Image Reconstruction in Scanning Transmission Electron Microscopy, A
* Hybrid Neural Fusion for Full-frame Video Stabilization
* Hybrid Video Anomaly Detection Framework via Memory-Augmented Flow Reconstruction and Flow-Guided Frame Prediction, A
* Hypercorrelation Squeeze for Few-Shot Segmenation
* Hypergraph Neural Networks for Hypergraph Matching
* Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding
* Hyperspectral Image Denoising with Realistic Data
* I2UV-HandNet: Image-to-UV Prediction Network for Accurate and High-fidelity 3D Hand Mesh Modeling
* ICE: Inter-Instance Contrastive Encoding for Unsupervised Person Re-identification
* ICON: Learning Regular Maps Through Inverse Consistency
* ID-Reveal: Identity-aware DeepFake Video Detection
* IDARTS: Interactive Differentiable Architecture Search
* IDM: An Intermediate Domain Module for Domain Adaptive Person Re-ID
* IICNet: A Generic Framework for Reversible Image Conversion
* ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models
* Image Harmonization with Transformer
* Image Inpainting via Conditional Texture and Structure Dual Generation
* Image Manipulation Detection by Multi-View Multi-Scale Supervision
* Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models
* Image Shape Manipulation from a Single Augmented Training Sample
* Image Synthesis from Layout with Locality-Aware Mask Adaption
* Image Synthesis via Semantic Composition
* Image2Reverb: Cross-Modal Reverb Impulse Response Synthesis
* iMAP: Implicit Mapping and Positioning in Real-Time
* imGHUM: Implicit Generative Models of 3D Human Shape and Articulated Pose
* Impact of Aliasing on Generalization in Deep Convolutional Networks
* Improve Unsupervised Pretraining for Few-label Transfer
* Improving 3D Object Detection with Channel-wise Transformer
* Improving Contrastive Learning by Visualizing Feature Transformation
* Improving De-raining Generalization via Neural Reorganization
* Improving Generalization of Batch Whitening by Convolutional Unit Optimization
* Improving Low-Precision Network Quantization via Bin Regularization
* Improving Neural Network Efficiency via Post-training Quantization with Adaptive Floating-Point
* Improving robustness against common corruptions with frequency biased models
* Improving Robustness of Facial Landmark Detection by Defending against Adversarial Attacks
* In Defense of Scene Graphs for Image Captioning
* In-Place Scene Labelling and Understanding with Implicit Scene Representation
* In-the-Wild Single Camera 3D Reconstruction Through Moving Water Surfaces
* iNAS: Integral NAS for Device-Aware Salient Object Detection
* Incorporating Convolution Designs into Visual Transformers
* Incorporating Learnable Membrane Time Constant to Enhance Learning of Spiking Neural Networks
* Indoor Scene Generation from a Collection of Semantic-Segmented Depth Images
* Inference of Black Hole Fluid-Dynamics from Sparse Interferometric Measurements
* Inferring high-resolution traffic accident risk maps based on satellite imagery and GPS trajectories
* Infinite Nature: Perpetual View Generation of Natural Scenes from a Single Image
* Influence Selection for Active Learning
* Influence-Balanced Loss for Imbalanced Visual Classification
* Information-theoretic regularization for Multi-source Domain Adaptation
* InSeGAN: A Generative Approach to Segmenting Identical Instances in Depth Images
* Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks
* Instance Similarity Learning for Unsupervised Feature Representation
* Instance-level Image Retrieval using Reranking Transformers
* Instance-wise Hard Negative Example Generation for Contrastive Learning in Unpaired Image-to-Image Translation
* InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring
* Instances as Queries
* Integer-arithmetic-only Certified Robustness for Quantized Neural Networks
* Interacting Two-Hand 3D Pose and Shape Reconstruction from Single Color Image
* Interaction Compass: Multi-Label Zero-Shot Learning of Human-Object Interactions via Spatial Relations
* Interaction via Bi-directional Graph of Semantic Region Affinity for Scene Parsing
* Interactive Prototype Learning for Egocentric Action Recognition
* Internal Video Inpainting by Implicit Long-range Propagation
* Interpolation-Aware Padding for 3D Sparse Convolutional Neural Networks
* Interpretable Image Recognition by Constructing Transparent Embedding Space
* Interpretable Visual Reasoning via Induced Symbolic Space
* Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents
* Interpreting Attributions and Interactions of Adversarial Attacks
* IntraTomo: Self-supervised Learning-based Tomography via Sinogram Synthesis and Prediction
* Intrinsic-Extrinsic Preserved GANs for Unsupervised 3D Pose Transfer
* Inverting a Rolling Shutter Camera: Bring Rolling Shutter Images to High Framerate Global Shutter Video
* Invisible Backdoor Attack with Sample-Specific Triggers
* iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis
* Is Pseudo-Lidar needed for Monocular 3D Object detection?
* ISD: Self-Supervised Learning by Iterative Similarity Distillation
* ISNet: Integrate Image-Level and Semantic-Level Context for Semantic Segmentation
* Iterative label cleaning for transductive and semi-supervised few-shot learning
* JEM++: Improved Techniques for Training JEM
* Joint Audio-Visual Deepfake Detection
* Joint Inductive and Transductive Learning for Video Object Segmentation
* Joint Representation Learning and Novel Category Discovery on Single- and Multi-Modal Data
* Joint Topology-preserving and Feature-refinement Network for Curvilinear Structure Segmentation
* Joint Visual and Audio Learning for Video Highlight Detection
* Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition
* Just a Few Points are All You Need for Multi-view Stereo: A Novel Semi-supervised Learning Method for Multi-view Stereo
* Just Ask: Learning to Answer Questions from Millions of Narrated Videos
* Just One Moment: Structural Vulnerability of Deep Action Recognition against One Frame Attack
* Just) A Spoonful of Refinements Helps the Registration Error Go Down
* Keep CALM and Improve Visual Feature Attribution
* Kernel Methods in Hyperbolic Spaces
* Keypoint Communities
* KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs
* Knowledge Mining and Transferring for Domain Adaptive Object Detection
* Knowledge-Enriched Distributional Model Inversion Attacks
* KoDF: A Large-scale Korean DeepFake Detection Dataset
* Labels4Free: Unsupervised Segmentation using StyleGAN
* LabOR: Labeling Only if Required for Domain Adaptive Semantic Segmentation
* LaLaLoc: Latent Layout Localisation in Dynamic, Unvisited Environments
* Language-Guided Global Image Editing via Cross-Modal Cyclic Mechanism
* LapsCore: Language-guided Person Search via Color Reasoning
* Large Scale Interactive Motion Forecasting for Autonomous Driving: The Waymo Open Motion Dataset
* Large Scale Multi-Illuminant (LSMI) Dataset for Developing White Balance Algorithm under Mixed Illumination
* Large-scale Robust Deep AUC Maximization: A New Surrogate Loss and Empirical Studies on Medical Image Classification
* Latent Transformations via NeuralODEs for GAN-based Image Editing
* Latent Transformer for Disentangled Face Editing in Images and Videos, A
* LatentCLR: A Contrastive Learning Approach for Unsupervised Discovery of Interpretable Directions
* LayoutTransformer: Layout Generation and Completion with Self-attention
* Lazy Approach to Long-Horizon Gradient-Based Meta-Learning, A
* Learn to Cluster Faces via Pairwise Classification
* Learn to Match: Automatic Matching Network Design for Visual Tracking
* Learn-to-Race: A Multimodal Control Environment for Autonomous Racing
* Learnable Boundary Guided Adversarial Training
* Learned Spatial Representations for Few-shot Talking-Head Synthesis
* Learning A Single Network for Scale-Arbitrary Super-Resolution
* Learning a Sketch Tensor Space for Image Inpainting of Man-made Scenes
* Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization
* Learning an Augmented RGB Representation with Cross-Modal Knowledge Distillation for Action Detection
* Learning Anchored Unsigned Distance Functions with Gradient Direction Alignment for Single-view Garment Reconstruction
* Learning Attribute-driven Disentangled Representations for Interactive Fashion Retrieval
* Learning Better Visual Data Similarities via New Grouplet Non-Euclidean Embedding
* Learning Bias-Invariant Representation by Cross-Sample Mutual Information Minimization
* Learning by Aligning: Visible-Infrared Person Re-identification using Cross-Modal Correspondences
* Learning Canonical 3D Object Representation for Fine-Grained Recognition
* Learning Canonical View Representation for 3D Shape Recognition with Arbitrary Views
* Learning Causal Representation for Training Cross-Domain Pose Estimator via Generative Interventions
* Learning Compatible Embeddings
* Learning Conditional Knowledge Distillation for Degraded-Reference Image Quality Assessment
* Learning Cross-Modal Contrastive Features for Video Domain Adaptation
* Learning Deep Local Features with Multiple Dynamic Attentions for Large-Scale Image Retrieval
* Learning Dual Priors for JPEG Compression Artifacts Removal
* Learning Dynamic Interpolation for Extremely Sparse Light Fields with Wide Baselines
* Learning Efficient Photometric Feature Transform for Multi-view Stereo
* Learning Facial Representations from the Cycle-consistency of Face
* Learning Fast Sample Re-weighting Without Reward Data
* Learning Frequency-aware Dynamic Network for Efficient Super-Resolution
* Learning from Noisy Data with Robust Representation Learning
* Learning Generative Models of Textured 3D Meshes from Real-World Images
* Learning Hierarchical Graph Neural Networks for Image Clustering
* Learning High-Fidelity Face Texture Completion without Complete Face Texture
* Learning Icosahedral Spherical Probability Map Based on Bingham Mixture Model for Vanishing Point Estimation
* Learning Indoor Inverse Rendering with 3D Spatially-Varying Lighting
* Learning Inner-Group Relations on Point Clouds
* Learning Instance-level Spatial-Temporal Patterns for Person Re-identification
* Learning Latent Architectural Distribution in Differentiable Neural Architecture Search via Variational Information Maximization
* Learning Meta-class Memory for Few-Shot Semantic Segmentation
* Learning Motion Priors for 4D Human Body Capture in 3D Scenes
* Learning Motion-Appearance Co-Attention for Zero-Shot Video Object Segmentation
* Learning Multi-Scene Absolute Pose Regression with Transformers
* Learning Multiple Pixelwise Tasks Based on Loss Scale Balancing
* Learning Object-Compositional Neural Radiance Field for Editable Scene Rendering
* Learning of Visual Relations: The Devil is in the Tails
* Learning Privacy-preserving Optics for Human Pose Estimation
* Learning Rare Category Classifiers on a Tight Labeling Budget
* Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision
* Learning Realistic Human Reposing using Cyclic Self-Supervision with 3D Shape, Pose, and Appearance Consistency
* Learning Self-Consistency for Deepfake Detection
* Learning Self-Similarity in Space and Time as Generalized Motion for Video Action Recognition
* Learning Signed Distance Field for Multi-view Surface Reconstruction
* Learning Skeletal Graph Neural Networks for Hard 3D Pose Estimation
* Learning Spatio-Temporal Transformer for Visual Tracking
* Learning specialized activation functions with the Piecewise Linear Unit
* Learning Target Candidate Association to Keep Track of What Not to Track
* Learning Temporal Dynamics from Cycles in Narrated Video
* Learning to Adversarially Blur Visual Object Tracking
* Learning to Better Segment Objects from Unseen Classes with Unlabeled Videos
* Learning to Bundle-adjust: A Graph Network Approach to Faster Optimization of Bundle Adjustment for Vehicular SLAM
* Learning to Cut by Watching Movies
* Learning to Discover Reflection Symmetry via Polar Matching Convolution
* Learning to Diversify for Single Domain Generalization
* Learning to drive from a world on rails
* Learning to Estimate Hidden Motions with Global Motion Aggregation
* Learning to Generate Scene Graph from Natural Language Supervision
* Learning to Hallucinate Examples from Extrinsic and Intrinsic Supervision
* Learning to Know Where to See: A Visibility-Aware Approach for Occluded Person Re-identification
* Learning to Match Features with Seeded Graph Matching Network
* Learning to Reduce Defocus Blur by Realistically Modeling Dual-Pixel Data
* Learning to Regress Bodies from Images using Differentiable Semantic Rendering
* Learning to Remove Refractive Distortions from Underwater Images
* Learning to Resize Images for Computer Vision Tasks
* Learning to Stylize Novel Views
* Learning to Track Objects from Unlabeled Videos
* Learning to Track with Object Permanence
* Learning Unsupervised Metaformer for Anomaly Detection
* Learning with Memory-based Virtual Classes for Deep Metric Learning
* Learning with Noisy Labels for Robust Point Cloud Segmentation
* Learning with Noisy Labels via Sparse Regularization
* Learning with Privileged Tasks
* Let's See Clearly: Contaminant Artifact Removal for Moving Cameras
* Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation
* LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference
* LFI-CAM: Learning Feature Importance for Better Visual Explanation
* Lifelong Infinite Mixture Model Based on Knowledge-Driven Dirichlet Process
* LIGA-Stereo: Learning LiDAR Geometry Aware Representations for Stereo-based 3D Detector
* Light Field Saliency Detection with Dual Local Graph Learning and Reciprocative Guidance
* Light Source Guided Single-Image Flare Removal from Unpaired Data
* Light Stage on Every Desk, A
* Lightweight Multi-person Total Motion Capture Using Sparse Multi-view Cameras
* Likelihood-Based Diverse Sampling for Trajectory Forecasting
* Linguistically Routing Capsule Network for Out-of-distribution Visual Question Answering
* Lipschitz Continuity Guided Knowledge Distillation
* LIRA: Learnable, Imperceptible and Robust Backdoor Attacks
* Local Temperature Scaling for Probability Calibration
* Localize to Binauralize: Audio Spatialization from Visual Sound Source Localization
* Localized Simple Multiple Kernel K-means
* LocalTrans: A Multiscale Local Transformer Network for Cross-Resolution Homography Estimation
* Location-aware Single Image Reflection Removal
* LocTex: Learning Data-Efficient Visual Representations from Localized Textual Supervision
* LoFGAN: Fusing Local Representations for Few-shot Image Generation
* LOKI: Long Term and Key Intentions for Trajectory Prediction
* Long Short View Feature Decomposition via Contrastive Video Representation Learning
* Long-Term Temporally Consistent Unpaired Video Translation from Simulated Surgical 3D Data
* Looking here or there? Gaze Following in 360-Degree Images
* LookOut: Diverse Multi-Future Prediction and Planning for Self-Driving
* LoOp: Looking for Optimal Hard Negative Embeddings for Deep Metric Learning
* Low Curvature Activations Reduce Overfitting in Adversarial Training
* Low-Rank Tensor Completion by Approximating the Tensor Average Rank
* Low-Shot Validation: Active Importance Sampling for Estimating Classifier Performance on Rare Categories
* LSD-StructureNet: Modeling Levels of Structural Detail in 3D Part Hierarchies
* LSG-CPD: Coherent Point Drift with Local Surface Geometry for Point Cloud Registration
* Lucas-Kanade Reloaded: End-to-End Super-Resolution from Raw Image Bursts
* M3D-VTON: A Monocular-to-3D Virtual Try-On Network
* MAAS: Multi-modal Assignation for Active Speaker Detection
* Machine Teaching Framework for Scalable Recognition, A
* Making Higher Order MOT Scalable: An Efficient Approximate Solver for Lifted Disjoint Paths
* Manifold Alignment for Semantically Aligned Style Transfer
* Manifold Matching via Deep Metric Learning for Generative Modeling
* Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization, The
* Matching in the Dark: A Dataset for Matching Image Pairs of Low-light Scenes
* MBA-VO: Motion Blur Aware Visual Odometry
* mDALU: Multi-Source Domain Adaptation and Label Unification with Partial Datasets
* MDETR: Modulated Detection for End-to-End Multi-Modal Understanding
* Me-Momentum: Extracting Hard Confident Examples from Noisily Labeled Data
* ME-PCN: Point Completion Conditioned on Mask Emptiness
* Mean Shift for Self-Supervised Learning
* MEDIRL: Predicting the Visual Attention of Drivers via Maximum Entropy Deep Inverse Reinforcement Learning
* Membership Inference Attacks are Easier on Difficult Problems
* Memory-Augmented Dynamic Neural Relational Inference
* Mesh Graphormer
* MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement
* Meta Gradient Adversarial Attack
* Meta Learning on a Sequence of Imbalanced Domains with Difficulty Awareness
* Meta Navigator: Search for a Good Adaptation Policy for Few-shot Learning
* Meta Pairwise Relationship Distillation for Unsupervised Person Re-identification
* Meta-Aggregator: Learning to Aggregate for 1-bit Graph Neural Networks
* Meta-Attack: Class-agnostic and Model-agnostic Physical Adversarial Attack
* Meta-Baseline: Exploring Simple Meta-Learning for Few-Shot Learning
* Meta-Learning with Task-Adaptive Loss Function for Few-Shot Learning
* MFNet: Multi-filter Directive Network for Weakly Supervised Salient Object Detection
* MG-GAN: A Multi-Generator Model Preventing Out-of-Distribution Samples in Pedestrian Trajectory Prediction
* MGNet: Monocular Geometric Scene Understanding for Autonomous Driving
* MGSampler: An Explainable Sampling Strategy for Video Action Recognition
* MicroNet: Improving Image Recognition with Extremely Low FLOPs
* MINE: Towards Continuous Depth MPI with NeRF for Novel View Synthesis
* Minimal Adversarial Examples for Deep Learning on 3D Point Clouds
* Minimal Cases for Computing the Generalized Relative Pose using Affine Correspondences
* Minimal Solutions for Panoramic Stitching Given Gravity Prior
* Mining Contextual Information Beyond Image for Semantic Segmentation
* Mining Latent Classes for Few-shot Segmentation
* Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields
* Mitigating Intensity Bias in Shadow Detection via Feature Decomposition and Reweighting
* Mixed SIGNals: Sign Language Production via a Mixture of Motion Primitives
* MixMix: All You Need for Data-Free Compression Are Feature and Data Mixing
* MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks
* Mixture-based Feature Space Learning for Few-shot Image Classification
* MLVSNet: Multi-level Voting Siamese Network for 3D Visual Tracking
* Modelling Neighbor Relation in Joint Space-Time Graph for Video Correspondence Learning
* Modulated Graph Convolutional Network for 3D Human Pose Estimation
* Modulated Periodic Activations for Generalizable Local Functional Representations
* Monocular, One-stage, Regression of Multiple 3D People
* MonoIndoor: Towards Good Practice of Self-Supervised Monocular Depth Estimation for Indoor Environments
* MonteFloor: Extending MCTS for Reconstructing Accurate Large-Scale Floor Plans
* Morphable Detector for Object Detection on Demand
* MosaicOS: A Simple and Effective Use of Object-Centric Images for Long-Tailed Object Detection
* Motion Adaptive Pose Estimation from Compressed Videos
* Motion Basis Learning for Unsupervised Deep Homography Estimation with Subspace Projection
* Motion Deblurring with Real Events
* Motion Guided Attention Fusion to Recognize Interactions from Videos
* Motion Guided Region Message Passing for Video Captioning
* Motion Prediction using Trajectory Cues
* Motion-Augmented Self-Training for Video Recognition at Smaller Scale
* Motion-Aware Dynamic Architecture for Efficient Frame Interpolation
* Motion-Focused Contrastive Learning of Video Representations*
* MOTSynth: How Can Synthetic Data Help Pedestrian Detection and Tracking?
* Move2Hear: Active Audio-Visual Source Separation
* MSR-GCN: Multi-Scale Residual Graph Convolution Networks for Human Motion Prediction
* MT-ORL: Multi-Task Occlusion Relationship Learning
* Multi-Anchor Active Domain Adaptation for Semantic Segmentation
* Multi-Class Cell Detection Using Spatial Context Representation
* Multi-Class Multi-Instance Count Conditioned Adversarial Image Generation
* Multi-Echo LiDAR for 3D Object Detection
* Multi-Expert Adversarial Attack Detection in Person Re-identification Using Context Inconsistency
* Multi-Instance Pose Networks: Rethinking Top-Down Pose Estimation
* Multi-Level Curriculum for Training A Distortion-Aware Barrel Distortion Rectification Model
* Multi-Modal Multi-Action Video Recognition
* Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video
* Multi-Mode Modulator for Multi-Domain Few-Shot Classification, A
* Multi-scale Matching Networks for Semantic Correspondence
* Multi-Scale Separable Network for Ultra-High-Definition Video Deblurring
* Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding
* Multi-Source Domain Adaptation for Object Detection
* Multi-Target Adversarial Frameworks for Domain Adaptation in Semantic Segmentation
* Multi-Task Self-Training for Learning General Representations
* Multi-VAE: Learning Disentangled View-common and View-peculiar Visual Representations for Multi-view Clustering
* Multi-view 3D Reconstruction with Transformers
* Multi-View Radar Semantic Segmentation
* Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos
* Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images
* Multimodal Knowledge Expansion
* Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts
* Multiple Pairwise Ranking Networks for Personalized Video Summarization
* Multiresolution Deep Implicit Functions for 3D Shape Representation
* Multiscale Vision Transformers
* MultiSiam: Self-supervised Multi-instance Siamese Representation Learning for Autonomous Driving
* Multispectral illumination estimation using deep unrolling network
* MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions
* Multitask AET with Orthogonal Tangent Regularity for Dark Object Detection
* Multiview Pseudo-Labeling for Semi-supervised Learning from Video
* MUSIQ: Multi-Scale Image Quality Transformer
* Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution
* Mutual Supervision for Dense Object Detection
* Mutual-Complementing Framework for Nuclei Detection and Segmentation in Pathology Image
* MVSNeRF: Fast Generalizable Radiance Field Reconstruction from Multi-View Stereo
* MVTN: Multi-View Transformation Network for 3D Shape Recognition
* N-ImageNet: Towards Robust, Fine-Grained Object Recognition with Event Cameras
* NAS-OoD: Neural Architecture Search for Out-of-Distribution Generalization
* NASOA: Towards Faster Task-Oriented Online Fine-Tuning with a Zoo of Models
* Naturalistic Physical Adversarial Patch for Object Detectors
* NEAT: Neural Attention Fields for End-to-End Autonomous Driving
* NeRD: Neural Reflectance Decomposition from Image Collections
* Nerfies: Deformable Neural Radiance Fields
* NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo
* Neural Architecture Search for Joint Human Parsing and Pose Estimation
* Neural Articulated Radiance Field
* Neural Image Compression via Attentional Multi-scale Back Projection and Frequency Decomposition
* Neural Photofit: Gaze-based Mental Image Reconstruction
* Neural Radiance Flow for 4D View Synthesis and Video Processing
* Neural Strokes: Stylized Line Drawing of 3D Shapes
* Neural TMDlayer: Modeling Instantaneous flow of features via SDE Generators
* Neural Video Portrait Relighting in Real-time via Consistency Modeling
* Neural-GIF: Neural Generalized Implicit Functions for Animating People in Clothing
* NeuSpike-Net: High Speed Video Reconstruction via Bio-inspired Neuromorphic Cameras
* New Journey from SDRTV to HDRTV, A
* NGC: A Unified Framework for Learning with Open-World Noisy Data
* Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synthesis of a Dynamic Scene From Monocular Video
* Normalization Matters in Weakly Supervised Object Localization
* Normalized Human Pose Features for Human Action Video Alignment
* Not All Operations Contribute Equally: Hierarchical Operation-adaptive Predictor for Neural Architecture Search
* NPMs: Neural Parametric Models for 3D Deformable Shapes
* OadTR: Online Action Detection with Transformers
* Object Tracking by Jointly Exploiting Frame and Event Domain
* Objects as Cameras: Estimating High-Frequency Illumination from Shadows
* Occlude Them All: Occlusion-Aware Attention Network for Occluded Person Re-ID
* Occluded Person Re-Identification with Single-scale Global Representations
* Occlusion-Aware Video Object Inpainting
* ODAM: Object Detection, Association, and Mapping using Posed RGB Video
* OMNet: Learning Overlapping Mask for Partial-to-Partial Point Cloud Registration
* Omni-GAN: On the Secrets of cGANs and Beyond
* Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets from 3D Scans
* Omniscient Video Super-Resolution
* On Compositions of Transformations in Contrastive Self-Supervised Learning
* On Equivariant and Invariant Learning of Object Landmark Representations
* On Exposing the Challenging Long Tail in Future Prediction of Traffic Actors
* On Feature Decorrelation in Self-Supervised Learning
* On Generating Transferable Targeted Perturbations
* On the hidden treasure of dialog in video question answering
* On the Importance of Distractors for Few-Shot Classification
* On the Limits of Pseudo Ground Truth in Visual Camera Re-localisation
* On the Robustness of Vision Transformers to Adversarial Examples
* Once Quantization-Aware Training: High Performance Extremely Low-bit Architecture Search
* One-pass Multi-view Clustering for Large-scale Data
* Online Continual Learning with Natural Distribution Shifts: An Empirical Study with Visual Data
* Online Knowledge Distillation for Efficient Pose Estimation
* Online Multi-Granularity Distillation for GAN Compression
* Online Pseudo Label Generation by Hierarchical Cluster Dynamics for Adaptive Person Re-identification
* Online Refinement of Low-level Feature Based Activation Map for Weakly Supervised Object Localization
* Online-trained Upsampler for Deep Low Complexity Video Compression
* OpenForensics: Large-Scale Challenging Dataset For Multi-Face Forgery Detection And Segmentation In-The-Wild
* OpenGAN: Open-Set Recognition via Open Data Generation
* ORBIT: A Real-World Few-Shot Dataset for Teachable Object Recognition
* Oriented R-CNN for Object Detection
* Orthogonal Jacobian Regularization for Unsupervised Disentanglement in Image Generation
* Orthogonal Projection Loss
* Orthographic-Perspective Epipolar Geometry
* OSCAR-Net: Object-centric Scene Graph Attention for Image Attribution
* Out-of-boundary View Synthesis Towards Full-Frame Video Stabilization
* Out-of-Core Surface Reconstruction via Global TGV Minimization
* OVANet: One-vs-All Network for Universal Domain Adaptation
* Overfitting the Data: Compact Neural Video Delivery via Content-aware Feature Modulation
* P2-Net: Joint Description and Detection of Local Features for Pixel and Point Matching
* Paint Transformer: Feed Forward Neural Painting with Stroke Prediction
* Painting from Part
* Pano-AVQA: Grounded Audio-Visual Question Answering on 360° Videos
* Panoptic Narrative Grounding
* Panoptic Segmentation of Satellite Image Time Series with Convolutional Temporal Attention Networks
* Parallel Detection-and-Segmentation Learning for Weakly Supervised Instance Segmentation
* Parallel Multi-Resolution Fusion Network for Image Inpainting
* Parallel Rectangle Flip Attack: A Query-based Black-box Attack against Object Detection
* Parametric Contrastive Learning
* PARE: Part Attention Regressor for 3D Human Body Estimation
* Parsing Table Structures in the Wild
* Partial Off-policy Learning: Balance Accuracy and Diversity for Human-Oriented Image Captioning
* Partial Video Domain Adaptation with Partial Adversarial Temporal Attentive Network
* Partner-Assisted Learning for Few-Shot Image Classification
* PARTS: Unsupervised segmentation with slots, attention and independence maximization
* PASS: Protected Attribute Suppression System for Mitigating Bias in Face Recognition
* Patch Craft: Video Denoising by Deep Modeling and Patch Matching
* Patch2CAD: Patchwise Embedding Learning for In-the-Wild Shape Retrieval from a Single Image
* PatchMatch-RL: Deep MVS with Pixelwise Depth, Normal, and Visibility
* Pathdreamer: A World Model for Indoor Navigation
* PCAM: Product of Cross-Attention Matrices for Rigid Registration of Point Clouds
* Perception-Aware Multi-Sensor Fusion for 3D LiDAR Semantic Segmentation
* Perceptual Variousness Motion Deblurring with Light Global Context Refinement
* Persistent Homology based Graph Convolution Network for Fine-grained 3D Shape Segmentation
* Personalized and Invertible Face De-identification by Disentangled Identity Information Manipulation
* Personalized Image Semantic Segmentation
* Personalized Trajectory Prediction via Distribution Discrimination
* Perturbed Self-Distillation: Weakly Supervised Large-Scale Point Cloud Semantic Segmentation
* Photon-Starved Scene Inference using Single Photon Cameras
* Physics-based Differentiable Depth Sensor Simulation
* Physics-based Human Motion Estimation and Synthesis from Videos
* Physics-Enhanced Machine Learning for Virtual Fluorescence Microscopy
* Pi-NAS: Improving Neural Architecture Search by Reducing Supernet Training Consistency Shift
* PIAP-DF: Pixel-Interested and Anti Person-Specific Facial Action Unit Detection Net with Discrete Feedback Learning
* PICCOLO: Point Cloud-Centric Omnidirectional Localization
* PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering
* PIT: Position-Invariant Transform for Cross-FoV Domain Adaptation
* Pixel Contrastive-Consistent Semi-Supervised Semantic Segmentation
* Pixel Difference Networks for Efficient Edge Detection
* Pixel-Perfect Structure-from-Motion with Featuremetric Refinement
* PixelPyramids: Exact Inference Models from Lossless Image Pyramids
* PixelSynth: Generating a 3D-Consistent Experience from a Single Image
* Planar Surface Reconstruction from Sparse Views
* PlaneTR: Structure-Guided Transformers for 3D Plane Recovery
* PlenOctrees for Real-time Rendering of Neural Radiance Fields
* PnP-DETR: Towards Efficient Visual Analysis with Transformers
* PoGO-Net: Pose Graph Optimization with Graph Neural Networks
* Point Cloud Augmentation with Weighted Local Transformations
* Point Transformer
* Point-Based Modeling of Human Clothing
* Point-set Distances for Learning Representations of 3D Point Clouds
* PointBA: Towards Backdoor Attacks in 3D Point Cloud
* PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers
* Polarimetric Helmholtz Stereopsis
* Poly-NL: Linear Complexity Non-local Layers With 3rd Order Polynomials
* Pose Correction for Highly Accurate Visual Localization in Large-scale Indoor Spaces
* Pose Invariant Topological Memory for Visual Navigation
* Power of Points for Modeling Humans in Clothing, The
* PR-GCN: A Deep Graph Convolutional Network with Point Refinement for 6D Pose Estimation
* PR-Net: Preference Reasoning for Personalized Video Highlight Detection
* PR-RRN: Pairwise-Regularized Residual-Recursive Networks for Non-rigid Structure-from-Motion
* Practical Relative Order Attack in Deep Ranking
* PreDet: Large-scale weakly supervised pre-training for detection
* Predicting with Confidence on Unseen Distributions
* Prediction by Anticipation: An Action-Conditional Prediction Method based on Interaction Learning
* Predictive Feature Learning for Future Segmentation Prediction
* Preservational Learning Improves Self-supervised Medical Image Models by Reconstructing Diverse Contexts
* Pri3D: Can 3D Priors Help 2D Representation Learning?
* PrimitiveNet: Primitive Instance Segmentation with Local Primitive Embedding under Adversarial Metric
* Prior to Segment: Foreground Cues for Weakly Annotated Classes in Partially Supervised Instance Segmentation
* Probabilistic Modeling for Human Mesh Recovery
* Probabilistic Monocular 3D Human Pose Estimation with Normalizing Flows
* Procedure Planning in Instructional Videos via Contextual Modeling and Model-based Policy Learning
* Procrustean Training for Imbalanced Deep Learning
* Product Quantizer Aware Inverted Index for Scalable Nearest Neighbor Search
* Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-Modal Pretraining
* ProFlip: Targeted Trojan Attack with Progressive Bit Flips
* Progressive Correspondence Pruning by Consensus Learning
* Progressive Seed Generation Auto-Encoder for Unsupervised Point Cloud Learning
* Prototypical Matching and Open Set Rejection for Zero-Shot Semantic Segmentation
* Provably Approximated Point Cloud Registration
* Pseudo-loss Confidence Metric for Semi-supervised Few-shot Learning
* Pseudo-mask Matters in Weakly-supervised Semantic Segmentation
* PT-CapsNet: A Novel Prediction-Tuning Capsule Network Suitable for Deeper Architectures
* PU-EVA: An Edge-Vector based Approximation Solution for Flexible-scale Point Cloud Upsampling
* Pursuit of Knowledge: Discovering and Localizing Novel Categories using Dual Memory, The
* Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis
* PX-NET: Simple and Efficient Pixel-Wise Training of Photometric Stereo Networks
* PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop
* Pyramid Architecture Search for Real-Time Image Deblurring
* Pyramid Point Cloud Transformer for Large-Scale Place Recognition
* Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection
* Pyramid Spatial-Temporal Aggregation for Video-based Person Re-Identification
* Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
* Q-Match: Iterative Shape Matching via Quantum Annealing
* Query Adaptive Few-Shot Object Detection with Heterogeneous Graph Convolutional Networks
* R-MSFM: Recurrent Multi-Scale Feature Modulation for Monocular Depth Estimating
* R-SLAM: Optimizing Eye Tracking from Rolling Shutter Video of the Retina
* Radial Distortion Invariant Factorization for Structure from Motion
* RAIN: Reinforced Hybrid Attention Inference Network for Motion Forecasting
* RandomRooms: Unsupervised Pre-training from Synthetic Shapes and Randomized Layouts for 3D Object Detection
* RangeDet: In Defense of Range View for LiDAR-based 3D Object Detection
* Rank &Sort Loss for Object Detection and Instance Segmentation
* RANK-NOSH: Efficient Predictor-Based Architecture Search via Non-Uniform Successive Halving
* Ranking Models in Unlabeled New Environments
* Rational Polynomial Camera Model Warping for Deep Learning Based Satellite Multi-View Stereo Matching
* RDA: Robust Domain Adaptation via Fourier Adversarial Attacking
* RDI-Net: Relational Dynamic Inference Networks
* Re-Aging GAN: Toward Personalized Face Age Transformation
* Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation
* Re-energizing Domain Discriminator with Sample Relabeling for Adversarial Domain Adaptation
* Real-time Image Enhancer via Learnable Spatial-aware 3D Lookup Tables
* Real-time Instance Segmentation with Discriminative Orientation Maps
* Real-time Vanishing Point Detector Integrating Under-parameterized RANSAC and Hough Transform
* Real-Time Video Inference on Edge Devices via Adaptive Model Streaming
* Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme
* Reality Transform Adversarial Generators for Image Splicing Forgery Detection and Localization
* RECALL: Replay-based Continual Learning in Semantic Segmentation
* Reconcile Prediction Consistency for Balanced Object Detection
* ReconfigISP: Reconfigurable Camera Image Processing Pipeline
* Reconstructing Hand-Object Interactions in the Wild
* ReCU: Reviving the Dead Weights in Binary Neural Networks
* Recurrent Mask Refinement for Few-Shot Medical Image Segmentation
* Recursively Conditional Gaussian for Ordinal Unsupervised Domain Adaptation
* ReDAL: Region-based and Diversity-aware Active Learning for Point Cloud Semantic Segmentation
* Refining Action Segmentation with Hierarchical Video Representations
* Refining activation downsampling with SoftPool
* Region Similarity Representation Learning
* Region-aware Contrastive Learning for Semantic Segmentation
* Regularizing Nighttime Weirdness: Efficient Self-supervised Monocular Depth Estimation in the Dark
* Rehearsal revealed: The limits and merits of revisiting samples in continual learning
* Relating Adversarially Robust Generalization to Flat Minima
* Relational Embedding for Few-Shot Classification
* Relaxed Transformer Decoders for Direct Action Proposal Generation
* Reliably fast adversarial training via latent adversarial perturbation
* Removing Adversarial Noise in Class Activation Feature Space
* Removing the Bias of Integral Pose Regression
* RePOSE: Fast 6D Object Pose Refinement via Deep Texture Rendering
* Representative Color Transform for Image Enhancement
* Residual Attention: A Simple but Effective Method for Multi-Label Recognition
* ResRep: Lossless CNN Pruning via Decoupling Remembering and Forgetting
* ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement
* Rethinking 360° Image Visual Attention Modelling with Unsupervised Learning
* Rethinking and Improving Relative Position Encoding for Vision Transformer
* Rethinking Coarse-to-Fine Approach in Single Image Deblurring
* Rethinking Counting and Localization in Crowds: A Purely Point-Based Framework
* Rethinking Deep Image Prior for Denoising
* Rethinking Noise Synthesis and Modeling in Raw Denoising
* Rethinking preventing class-collapsing in metric learning with margin-based losses
* Rethinking Self-supervised Correspondence Learning: A Video Frame-level Similarity Perspective
* Rethinking Spatial Dimensions of Vision Transformers
* Rethinking the Backdoor Attacks' Triggers: A Frequency Perspective
* Rethinking the Truly Unsupervised Image-to-Image Translation
* Rethinking Transformer-based Set Prediction for Object Detection
* RetrievalFuse: Neural 3D Scene Reconstruction with a Database
* Retrieve in Style: Unsupervised Facial Feature Transfer and Retrieval
* Revealing the Reciprocal Relations between Self-Supervised Stereo and Monocular Depth Estimation
* Revisiting Adversarial Robustness Distillation: Robust Soft Labels Make Student Better
* Revisiting Stereo Depth Estimation From a Sequence-to-Sequence Perspective with Transformers
* Revitalizing Optimization for 3D Human Pose and Shape Estimation: A Sparse Constrained Formulation
* RFNet: Recurrent Forward Network for Dense Point Cloud Completion
* RFNet: Region-aware Fusion Network for Incomplete Multi-modal Brain Tumor Segmentation
* RGB-D Saliency Detection via Cascaded Mutual Information Minimization
* Right to Talk: An Audio-Visual Transformer Approach, The
* RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth
* RMSMP: A Novel Deep Neural Network Quantization Framework with Row-wise Mixed Schemes and Multiple Precisions
* Road Anomaly Detection by Partial Image Reconstruction with Segmentation Coupling
* Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation, The
* Robust 2D/3D Vehicle Parsing in Arbitrary Camera Views for CVIS
* Robust Automatic Monocular Vehicle Speed Estimation for Traffic Surveillance
* Robust Loss for Point Cloud Registration, A
* Robust Object Detection via Instance-Level Temporal Cycle Confusion
* Robust Small Object Detection on the Water Surface through Fusion of Camera and Millimeter Wave Radar
* Robust Small-scale Pedestrian Detection with Cued Recall via Memory Learning
* Robust Trust Region for Weakly Supervised Segmentation
* Robust Watermarking for Deep Neural Networks via Bi-level Optimization
* RobustNav: Towards Benchmarking Robustness in Embodied Navigation
* Robustness and Generalization via Generative Adversarial Training
* Robustness Certification for Point Cloud Models
* Robustness via Cross-Domain Ensembles
* Rotation Averaging in a Split Second: A Primal-Dual Method and a Closed-Form for Cycle Graphs
* RPVNet: A Deep and Efficient Range-Point-Voxel Fusion Network for LiDAR Point Cloud Segmentation
* S3VAADA: Submodular Subset Selection for Virtual Adversarial Active Domain Adaptation
* SA-ConvONet: Sign-Agnostic Optimization of Convolutional Occupancy Networks
* SaccadeCam: Adaptive Visual Attention for Monocular Depth Sensing
* SACoD: Sensor Algorithm Co-Design Towards Efficient CNN-powered Intelligent PhlatCam
* Safety-aware Motion Prediction with Unseen Vehicles for Autonomous Driving
* Saliency-Associated Object Tracking
* Salient Object Ranking with Position-Preserved Attention
* Sample Efficient Detection and Classification of Adversarial Attacks via Self-Supervised Embeddings
* Sampling Network Guided Cross-Entropy Method for Unsupervised Point Cloud Registration
* Sat2Vid: Street-view Panoramic Video Synthesis from a Single Satellite Image
* SAT: 2D Semantics Assisted Training for 3D Visual Grounding
* Scalable Vision Transformers with Hierarchical Pooling
* Scaling Semantic Segmentation Beyond 1K Classes on a Single GPU
* Scaling up instance annotation via label propagation
* Scaling-up Disentanglement for Image Translation
* Scene Context-Aware Salient Object Detection
* Scene Synthesis via Uncertainty-Driven Attribute Synchronization
* Score-Based Point Cloud Denoising
* SCOUTER: Slot Attention-based Classifier for Explainable Image Recognition
* Scribble-Supervised Semantic Segmentation by Uncertainty Reduction on Neural Representation and Self-Supervision on Neural Eigenspace
* Scribble-Supervised Semantic Segmentation Inference
* Searching for Controllable Image Restoration Networks
* Searching for Robustness: Loss Learning for Noisy Classification Tasks
* Searching for Two-Stream Models in Multivariate Space for Video Recognition
* Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data
* Seeing Dynamic Scene in the Dark: A High-Quality Video Dataset with Mechatronic Alignment
* Seeking Similarities over Differences: Similarity-based Domain Alignment for Adaptive Object Detection
* Segmentation-grounded Scene Graph Generation
* Segmenter: Transformer for Semantic Segmentation
* Selective Feature Compression for Efficient Activity Recognition Inference
* Self Supervision to Distillation for Long-Tailed Visual Recognition
* Self-born Wiring for Neural Trees
* Self-Calibrating Neural Radiance Fields
* Self-Conditioned Probabilistic Learning of Video Rescaling
* Self-Knowledge Distillation with Progressive Refinement of Targets
* Self-Motivated Communication Agent for Real-World Vision-Dialog Navigation
* Self-Mutating Network for Domain Adaptive Segmentation of Aerial Images
* Self-Mutual Distillation Learning for Continuous Sign Language Recognition
* Self-Regulation for Semantic Segmentation
* Self-Supervised 3D Face Reconstruction via Conditional Estimation
* Self-Supervised 3D Hand Pose Estimation from monocular RGB via Contrastive Learning
* Self-supervised 3D Skeleton Action Representation Learning with Motion Consistency and Continuity
* Self-Supervised Cryo-Electron Tomography Volumetric Image Restoration from Single Noisy Volume with Sparsity Constraint
* Self-supervised Domain Adaptation for Forgery Localization of JPEG Compressed Images
* Self-supervised Geometric Features Discovery via Interpretable Attention for Vehicle Re-Identification and Beyond
* Self-Supervised Image Prior Learning with GMM from a Single Noisy Image
* Self-supervised Monocular Depth Estimation for All Day Images using Domain Separation
* Self-supervised Neural Networks for Spectral Snapshot Compressive Imaging
* Self-Supervised Object Detection via Generative Image Synthesis
* Self-Supervised Pretraining of 3D Features on any Point-Cloud
* Self-supervised Product Quantization for Deep Unsupervised Image Retrieval
* Self-Supervised Real-to-Sim Scene Generation
* Self-Supervised Representation Learning from Flow Equivariance
* Self-supervised Transfer Learning for Hand Mesh Recovery from Binocular Images
* Self-Supervised Vessel Segmentation via Adversarial Learning
* Self-supervised Video Object Segmentation by Motion Grouping
* Self-Supervised Video Representation Learning with Meta-Contrastive Network
* Self-Supervised Visual Representations Learning by Contrastive Mask Prediction
* SelfReg: Self-supervised Contrastive Regularization for Domain Generalization
* SeLFVi: Self-supervised Light-Field Video Reconstruction from Stereo Video
* Semantic Aware Data Augmentation for Cell Nuclei Microscopical Images with Artificial Neural Networks
* Semantic Concentration for Domain Adaptation
* Semantic Diversity Learning for Zero-Shot Multi-label Classification
* Semantic Perturbations with Normalizing Flows for Improved Generalization
* Semantic-embedded Unsupervised Spectral Reconstruction from Single RGB Images in the Wild
* Semantically Coherent Out-of-Distribution Detection
* Semantically Robust Unpaired Image Translation for Data with Unmatched Semantics Statistics
* Semantics Disentangling for Generalized Zero-Shot Learning
* Semi-Supervised Active Learning for Semi-Supervised Models: Exploit Adversarial Examples with Graph-based Virtual Labels
* Semi-Supervised Active Learning with Temporal Output Discrepancy
* Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples
* Semi-Supervised Semantic Segmentation with Pixel-Level Contrastive Learning from a Class-wise Memory Bank
* Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation
* SemIE: Semantically-aware Image Extrapolation
* SemiHand: Semi-supervised Hand Pose Estimation with Consistency
* Seminar Learning for Click-Level Weakly Supervised Semantic Segmentation
* Sensor-Guided Optical Flow
* SENTRY: Selective Entropy Optimization via Committee Consistency for Unsupervised Domain Adaptation
* Separable Flow: Learning Motion Cost Volumes for Optical Flow Estimation
* SGMNet: Learning Rotation-Invariant Point Cloud Representations via Sorted Gram Matrix
* SGPA: Structure-Guided Prior Adaptation for Category-Level 6D Object Pose Estimation
* Shallow Bayesian Meta Learning for Real-World Few-Shot Recognition
* Shape Self-Correction for Unsupervised Point Cloud Understanding
* Shape-Aware Multi-Person Pose Estimation from Multi-View Images
* Shape-Biased Domain Generalization via Shock Graph Embeddings
* ShapeConv: Shape-aware Convolutional Layer for Indoor RGB-D Semantic Segmentation
* SIGN: Spatial-information Incorporated Generative Network for Generalized Zero-shot Semantic Segmentation
* SignBERT: Pre-Training of Hand-Model-Aware Representation for Sign Language Recognition
* SIGNET: Efficient Neural Representation for Light Fields
* Simple Baseline for Semi-supervised Semantic Segmentation with Strong Data Augmentation*, A
* Simple Baseline for Weakly-Supervised Scene Graph Generation, A
* Simple Feature Augmentation for Domain Generalization, A
* Simple Framework for 3D Lensless Imaging with Programmable Masks, A
* Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer
* SimROD: A Simple Adaptation Method for Robust Object Detection
* SIMstack: A Generative Shape and Instance Model for Unordered Object Stacks
* Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning
* Single Image Defocus Deblurring Using Kernel-Sharing Parallel Atrous Convolutions
* Single View Physical Distance Estimation using Human Pose
* Single-shot Hyperspectral-Depth Imaging with Learned Diffractive Optics
* Skeleton Cloud Colorization for Unsupervised 3D Action Representation Learning
* Skeleton2Mesh: Kinematics Prior Injected Unsupervised Human Mesh Recovery
* Sketch Your Own GAN
* Sketch2Mesh: Reconstructing and Editing 3D Shapes from Sketches
* SketchAA: Abstract Representation for Abstract Sketches
* SketchLattice: Latticed Representation for Sketch Manipulation
* SLAMP: Stochastic Latent Appearance and Motion Prediction
* SLIDE: Single Image 3D Photography with Soft Layering and Depth-aware Inpainting
* SLIM: Self-Supervised LiDAR Scene Flow and Motion Segmentation
* SmartShadow: Artistic Shadow Drawing Tool for Line Drawings
* SNARF: Differentiable Forward Skinning for Animating Non-Rigid Neural Implicit Shapes
* SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer
* SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation
* Social Fabric: Tubelet Compositions for Video Relation Detection
* Social NCE: Contrastive Learning of Socially-aware Motion Representations
* Solving Inefficiency of Self-supervised Representation Learning
* SOMA: Solving Optical Marker-Based MoCap Automatically
* SOTR: Segmenting Objects with Transformers
* Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning
* Space-Time-Separable Graph Convolutional Network for Pose Forecasting
* Sparse Needlets for Lighting Estimation with Spherical Transport Loss
* Sparse-shot Learning with Exclusive Cross-Entropy for Extremely Many Localisations
* Sparse-to-dense Feature Matching: Intra and Inter domain Cross-modal Learning in Domain Adaptation for 3D Semantic Segmentation
* SPatchGAN: A Statistical Feature Based Discriminator for Unsupervised Image-to-Image Translation
* Spatial and Semantic Consistency Regularizations for Pedestrian Attribute Recognition
* Spatial Uncertainty-Aware Semi-Supervised Crowd Counting
* Spatial-Temporal Consistency Network for Low-Latency Trajectory Forecasting
* Spatial-Temporal Transformer for Dynamic Scene Graph Generation
* Spatially Conditioned Graphs for Detecting Human-Object Interactions
* Spatially-Adaptive Image Restoration using Distortion-Guided Networks
* Spatio-Temporal Dynamic Inference Network for Group Activity Recognition
* Spatio-Temporal Poisson Point Process: A Simple Model for the Alignment of Event Camera Data, The
* Spatio-Temporal Representation Factorization for Video-based Person Re-Identification
* Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds
* SPEC: Seeing People in the Wild with an Estimated Camera
* Specialize and Fuse: Pyramidal Output Representation for Semantic Segmentation
* Specificity-preserving RGB-D Saliency Detection
* Spectral Leakage and Rethinking the Kernel Size in CNNs
* Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates
* SPG: Unsupervised Domain Adaptation for 3D Object Detection via Semantic Point Generation
* Square Root Marginalization for Sliding-Window Bundle Adjustment
* SS-IL: Separated Softmax for Incremental Learning
* SSH: A Self-Supervised Framework for Image Harmonization
* Stacked Homography Transformations for Multi-View Pedestrian Detection
* Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation
* STAR: A Structure-aware Lightweight Transformer for Real-time Image Enhancement
* StarEnhancer: Learning Real-Time and Style-Aware Image Enhancement
* Statistically Consistent Saliency Estimation
* STEM: An approach to Multi-source Domain Adaptation with Guarantees
* StereOBJ-1M: Large-scale Stereo Image Dataset for 6D Object Pose Estimation
* Stochastic Partial Swap: Enhanced Model Generalization and Interpretability for Fine-grained Recognition
* Stochastic Scene-Aware Motion Prediction
* Stochastic Transformer Networks with Linear Competing Units: Application to end-to-end SL Translation
* STR-GQN: Scene Representation and Rendering for Unknown Cameras Based on Spatial Transformation Routing
* Striking a Balance between Stability and Plasticity for Class-Incremental Learning
* STRIVE: Scene Text Replacement In Videos
* StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimation
* Structure-from-Sherds: Incremental 3D Reassembly of Axially Symmetric Pots from Unordered and Mixed Fragment Collections
* Structure-Preserving Deraining with Residue Channel Prior Guidance
* Structure-transformed Texture-enhanced Network for Person Image Synthesis
* Structured Bird's-Eye-View Traffic Scene Understanding from Onboard Images
* Structured Outdoor Architecture Reconstruction by Exploration and Classification
* Student Customized Knowledge Distillation: Bridging the Gap Between Student and Teacher
* STVGBert: A Visual-linguistic Transformer based Framework for Spatio-temporal Video Grounding
* Style and Semantic Memory Mechanism for Domain Generalization*, A
* StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
* StyleFormer: Real-time Arbitrary Style Transfer via Parametric Style Composition
* Sub-bit Neural Networks: Learning to Compress and Accelerate Binary Neural Networks
* Summarize and Search: Learning Consensus-aware Dynamic Convolution for Co-Saliency Detection
* SUNet: Symmetric Undistortion Network for Rolling Shutter Correction
* Super Resolve Dynamic Scene from Continuous Spike Streams
* Super-Resolving Cross-Domain Face Miniatures by Peeking at One-Shot Exemplar
* Superpoint Network for Point Cloud Oversegmentation
* Support-Set Based Cross-Supervision for Video Grounding
* SurfaceNet: Adversarial SVBRDF Estimation from a Single Image
* SurfGen: Adversarial 3D Shape Synthesis with Explicit Surface Discriminators
* Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation, The
* surprising impact of mask-head architecture on novel class segmentation, The
* Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
* Switchable K-class Hyperplanes for Noise-Robust Representation Learning
* Synchronization of Group-labelled Multi-graphs
* Syncretic Modality Collaborative Learning for Visible Infrared Person Re-Identification
* SynFace: Face Recognition with Synthetic Data
* Synthesis of Compositional Animations from Textual Descriptions
* Synthesized Feature based Few-Shot Class-Incremental Learning on a Mixture of Subspaces
* T-AutoML: Automated Machine Learning for Lesion Segmentation using Transformers in 3D Medical Imaging
* T-Net: Effective Permutation-Equivariant Network for Two-View Correspondence Learning
* T-SVDNet: Exploring High-Order Prototypical Correlations for Multi-Source Domain Adaptation
* TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment
* Talk-to-Edit: Fine-Grained Facial Editing via Dialog
* TAM: Temporal Adaptive Module for Video Recognition
* Target Adaptive Context Aggregation for Video Scene Graph Generation
* Task Switching Network for Multi-task Learning
* Task-aware Part Mining Network for Few-Shot Learning
* Teacher-Student Adversarial Depth Hallucination to Improve Face Recognition
* TeachText: CrossModal Generalized Distillation for Text-Video Retrieval
* Telling the What while Pointing to the Where: Multimodal Queries for Image Retrieval
* TempNet: Online Semantic Segmentation on Large-scale Point Cloud Series
* Temporal Action Detection with Multi-level Supervision
* Temporal Cue Guided Video Highlight Detection with Low-Rank Audio-Visual Fusion
* Temporal Knowledge Consistency for Unsupervised Visual Representation Learning
* Temporal-wise Attention Spiking Neural Networks for Event Streams Classification
* Temporally-Coherent Surface Reconstruction via Metric-Consistent Atlases
* Testing using Privileged Information by Adapting Features with Statistical Dependence
* Text is Text, No Matter What: Unifying Text Recognition using Knowledge Distillation
* TF-Blender: Temporal Feature Blender for Video Object Detection
* TGRNet: A Table Graph Reconstruction Network for Table Structure Recognition
* THDA: Treasure Hunt Data Augmentation for Semantic Navigation
* Three Steps to Multimodal Trajectory Prediction: Modality Clustering, Classification and Synthesis
* THUNDR: Transformer-based 3D HUmaN Reconstruction with Markers
* Time-Equivariant Contrastive Video Representation Learning
* Time-Multiplexed Coded Aperture Imaging: Learned Coded Aperture and Pixel Exposures for Compressive Imaging Systems
* TkML-AP: Adversarial Attacks to Top-k Multi-Label Learning
* TMCOSS: Thresholded Multi-Criteria Online Subset Selection for Data-Efficient Autonomous Driving
* TokenPose: Learning Keypoint Tokens for Human Pose Estimation
* Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
* TOOD: Task-aligned One-stage Object Detection
* Topic Scene Graph Generation by Attention Distillation from Caption
* Topologically Consistent Multi-View Face Inference Using Volumetric Sampling
* Toward a Visual Concept Vocabulary for GAN Latent Space
* Toward Human-Like Grasp: Dexterous Grasping via Semantic Representation of Object-Hand
* Toward Realistic Single-View 3D Object Reconstruction with Unsupervised Learning from Multiple Images
* Toward Spatially Unbiased Generative Models
* Towards A Universal Model for Cross-Dataset Crowd Counting
* Towards Accurate Alignment in Real-time 3D Hand-Mesh Reconstruction
* Towards Alleviating the Modeling Ambiguity of Unsupervised Monocular 3D Human Pose Estimation
* Towards Better Explanations of Class Activation Mapping
* Towards Complete Scene and Regular Shape for Distortion Rectification by Curve-Aware Extrapolation
* Towards Discovery and Attribution of Open-world GAN Generated Images
* Towards Discriminative Representation Learning for Unsupervised Person Re-identification
* Towards Efficient Graph Convolutional Networks for Point Cloud Handling
* Towards Face Encryption by Generating Adversarial Identity Masks
* Towards Flexible Blind JPEG Artifacts Removal
* Towards High Fidelity Monocular Face Reconstruction with Rich Reflectance using Self-supervised Learning and Ray Tracing
* Towards Interpretable Deep Metric Learning with Structural Matching
* Towards Learning Spatially Discriminative Feature Representations
* Towards Memory-Efficient Neural Networks via Multi-Level in situ Generation
* Towards Mixed-Precision Quantization of Neural Networks via Constrained Optimization
* Towards Novel Target Discovery Through Open-Set Domain Adaptation
* Towards Real-World Prohibited Item Detection: A Large-Scale X-ray Benchmark
* Towards Real-world X-ray Security Inspection: A High-Quality Benchmark And Lateral Inhibition Module For Prohibited Items Detection
* Towards Robustness of Deep Neural Networks via Regularization
* Towards Rotation Invariance in Object Detection
* Towards the Unseen: Iterative Text Recognition by Distilling from Errors
* Towards Understanding the Generative Capability of Adversarially Robust Classifiers
* Towards Vivid and Diverse Image Colorization with Generative Color Prior
* Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision
* Track without Appearance: Learn Box and Tracklet Embedding with Local and Global Motion Patterns for Vehicle Tracking
* Training Multi-Object Detector by Estimating Bounding Box Distribution for Input Image
* Training Weakly Supervised Video Frame Interpolation with Events
* Transductive Few-Shot Classification on the Oblique Manifold
* TransFER: Learning Relation-aware Facial Expression Representations with Transformers
* TransferI2I: Transfer Learning for Image-to-Image Translation from Small Datasets
* TransForensics: Image Forgery Localization with Dense Self-Attention
* Transformer-Based Attention Networks for Continuous Pixel-Wise Prediction
* Transformer-based Dual Relation Graph for Multi-label Image Recognition
* Transforms based Tensor Robust PCA: Corrupted Low-Rank Tensors Recovery via Convex Optimization
* Transfusion: A Novel SLAM Method Focused on Transparent Objects
* Transparent Object Tracking Benchmark
* Transporting Causal Mechanisms for Unsupervised Domain Adaptation
* TransPose: Keypoint Localization via Transformer
* TransReID: Transformer-based Object Re-Identification
* TransVG: End-to-End Visual Grounding with Transformers
* TransView: Inside, Outside, and Across the Cropping View Boundaries
* TRAR: Routing the Attention Spans in Transformer for Visual Question Answering
* Trash to Treasure: Harvesting OOD Data with Cross-Modal Matching for Open-Set Semi-Supervised Learning
* TravelNet: Self-supervised Physically Plausible Hand Motion Learning from Monocular Color Images
* Triggering Failures: Out-Of-Distribution detection by learning from local adversarial attacks in Semantic Segmentation
* Tripartite Information Mining and Integration for Image Matting
* TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild
* TrivialAugment: Tuning-free Yet State-of-the-Art Data Augmentation
* TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization
* Tune it the Right Way: Unsupervised Validation of Domain Adaptation via Soft Neighborhood Density
* UASNet: Uncertainty Adaptive Sampling Network for Deep Stereo Matching
* Ultra-High-Definition Image HDR Reconstruction via Collaborative Bilateral Learning
* UltraPose: Synthesizing Dense Pose with 1 Billion Points by Human-body Decoupling 3D Model
* Unaligned Image-to-Image Translation by Learning to Reweight
* Uncertainty-Aware Human Mesh Recovery from Video by Learning Part-Based 3D Dynamics
* Uncertainty-aware Pseudo Label Refinery for Domain Adaptive Semantic Segmentation
* Uncertainty-Guided Transformer Reasoning for Camouflaged Object Detection
* Unconditional Scene Graph Generation
* Unconstrained Scene Generation with Locally Conditioned Radiance Fields
* Understanding and Evaluating Racial Biases in Image Captioning
* Understanding and Mitigating Annotation Bias in Facial Expression Recognition
* Understanding Robustness of Transformers for Image Classification
* Unidentified Video Objects: A Benchmark for Dense, Open-World Segmentation
* Unified 3D Human Motion Synthesis Model via Conditional Variational Auto-Encoder*, A
* Unified Graph Structured Models for Video Understanding
* Unified Objective for Novel Class Discovery, A
* Unified Questioner Transformer for Descriptive Question Generation in Goal-Oriented Visual Dialogue
* Uniformity in Heterogeneity: Diving Deep into Count Interval Partition for Crowd Counting
* Unifying Nonlocal Blocks for Neural Networks
* UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction
* UniT: Multimodal Multitask Learning with a Unified Transformer
* Universal and Flexible Optical Aberration Correction Using Deep-Prior Based Deconvolution
* Universal Cross-Domain Retrieval: Generalizing Across Classes and Domains
* Universal Representation Learning from Multiple Domains for Few-shot Classification
* Universal-Prototype Enhancing for Few-Shot Object Detection
* Unlimited Neighborhood Interaction for Heterogeneous Trajectory Prediction
* Unlocking the Potential of Ordinary Classifier: Class-specific Adversarial Erasing Framework for Weakly Supervised Semantic Segmentation
* Unpaired Learning for Deep Image Deraining with Rain Direction Regularizer
* Unpaired Learning for High Dynamic Range Image Tone Mapping
* Unshuffling Data for Improved Generalization in Visual Question Answering
* Unsupervised 3D Pose Estimation for Hierarchical Dance Video Recognition *
* Unsupervised Curriculum Domain Adaptation for No-Reference Video Quality Assessment
* Unsupervised Deep Video Denoising
* Unsupervised Dense Deformation Embedding Network for Template-Free Shape Correspondence
* Unsupervised Depth Completion with Calibrated Backprojection Layers
* Unsupervised Domain Adaptive 3D Detection with Multi-Level Consistency
* Unsupervised Few-Shot Action Recognition via Action-Appearance Aligned Meta-Adaptation
* Unsupervised Image Generation with Infinite Generative Adversarial Networks
* Unsupervised Layered Image Decomposition into Object Prototypes
* Unsupervised Learning of Fine Structure Generation for 3D Point Clouds by 2D Projection Matching
* Unsupervised Non-Rigid Image Distortion Removal via Grid Deformation
* Unsupervised Point Cloud Object Co-segmentation by Co-contrastive Learning and Mutual Attention Sampling
* Unsupervised Point Cloud Pre-training via Occlusion Completion
* Unsupervised Real-World Super-Resolution: A Domain Adaptation Perspective
* Unsupervised Segmentation incorporating Shape Prior via Generative Adversarial Networks
* Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals
* UVStyle-Net: Unsupervised Few-shot Learning of 3D Style Similarity Measure for B-Reps
* V-DESIRR: Very Fast Deep Embedded Single Image Reflection Removal
* VaPiD: A Rapid Vanishing Point Detector via Learned Optimizers
* Variable-Rate Deep Image Compression through Spatially-Adaptive Feature Transform
* Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting
* Variational Feature Disentangling for Fine-Grained Few-Shot Classification
* VariTex: Variational Neural Face Textures
* Vector Neurons: A General Framework for SO(3)-Equivariant Networks
* Vector-Decomposed Disentanglement for Domain-Invariant Object Detection
* VENet: Voting Enhancement Network for 3D Object Detection
* Vi2CLR: Video and Image for Visual Contrastive Learning of Representation
* Video Annotation for Visual Tracking via Selection and Refinement
* Video Autoencoder: Self-Supervised Disentanglement of Static 3D Structure and Motion
* Video Geo-Localization Employing Geo-Temporal Feature Learning and GPS Trajectory Smoothing
* Video Instance Segmentation with a Propose-Reduce Paradigm
* Video Matting via Consistency-Regularized Graph Neural Networks
* Video Object Segmentation with Dynamic Memory Networks and Adaptive Object Alignment
* Video Pose Distillation for Few-Shot, Fine-Grained Sports Action Recognition
* Video Question Answering Using Language-Guided Deep Compressed-Domain Video Feature
* Video Self-Stitching Graph Network for Temporal Action Localization
* Video-based Person Re-identification with Spatial and Temporal Memory Networks
* VideoLT: Large-scale Long-tailed Video Recognition
* VidTr: Video Transformer Without Convolutions
* Viewing Graph Solvability via Cycle Consistency
* ViewNet: Unsupervised Viewpoint Estimation from Conditional Generation
* Viewpoint Invariant Dense Matching for Visual Geolocalization
* Viewpoint-Agnostic Change Captioning with Cycle Consistency
* VIL-100: A New Dataset and A Baseline Model for Video Instance Lane Detection
* Virtual light transport matrices for non-line-of-sight imaging
* Virtual Multi-Modality Self-Supervised Foreground Matting for Human-Object Interaction
* Vis2Mesh: Efficient Mesh Reconstruction from Unstructured Point Clouds of Large Scenes with Learned Virtual View Visibility
* Visformer: The Vision-friendly Transformer
* Visio-Temporal Attention for Multi-Camera Multi-Target Association
* Vision Transformer with Progressive Sampling
* Vision Transformers for Dense Prediction
* Vision-Language Navigation with Random Environmental Mixup
* Vision-Language Transformer and Query Generation for Referring Segmentation
* Visual Alignment Constraint for Continuous Sign Language Recognition
* Visual Distant Supervision for Scene Graph Generation
* Visual Graph Memory with Unsupervised Representation for Visual Navigation
* Visual Relationship Detection Using Part-and-Sum Transformers with Composite Queries
* Visual Saliency Transformer
* Visual Scene Graphs for Audio Source Separation
* Visual Transformers: Where Do Transformers Really Belong in Vision Models?
* Visual-Textual Attentive Semantic Consistency for Medical Report Generation
* ViViT: A Video Vision Transformer
* VLGrammar: Grounded Grammar Induction of Vision and Language
* VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation
* VolumeFusion: Deep Depth Fusion for 3D Scene Reconstruction
* von Mises-Fisher Loss: An Exploration of Embedding Geometries for Supervised Learning
* Voxel Transformer for 3D Object Detection
* Voxel-based Network for Shape Completion by Leveraging Edge Generation
* VSAC: Efficient and Accurate Estimator for H and F
* Walk in the Cloud: Learning Curves for Point Clouds Shape Analysis
* Wanderlust: Online Continual Object Detection in the Real World
* Warp Consistency for Unsupervised Learning of Dense Correspondences
* Warp-Refine Propagation: Semi-Supervised Auto-Labeling via Cycle-Consistency
* WarpedGANSpace: Finding non-linear RBF paths in GAN latent space
* Wasserstein Coupled Graph Learning for Cross-Modal Retrieval
* Watch Only Once: An End-to-End Video Action Detection Framework
* WaveFill: A Wavelet-based Generation Network for Image Inpainting
* Way to my Heart is through Contrastive Learning: Remote Photoplethysmography from Unlabelled Video, The
* Waypoint Models for Instruction-guided Navigation in Continuous Environments
* WB-DETR: Transformer-Based Detector without Backbone
* Weak Adaptation Learning: Addressing Cross-domain Data Insufficiency with Weak Annotator
* Weakly Supervised 3D Semantic Segmentation Using Cross-Image Consensus and Inter-Voxel Affinity Relations
* Weakly Supervised Amodal Segmenter with Boundary Uncertainty Estimation, A
* Weakly Supervised Contrastive Learning
* Weakly Supervised Human-Object Interaction Detection in Video via Contrastive Spatiotemporal Regions
* Weakly Supervised Person Search with Region Siamese Networks
* Weakly Supervised Relative Spatial Reasoning for Visual Question Answering
* Weakly Supervised Representation Learning with Coarse Labels
* Weakly Supervised Segmentation of Small Buildings with Point Labels
* Weakly Supervised Temporal Anomaly Segmentation with Dynamic Time Warping
* Weakly Supervised Text-based Person Re-Identification
* Weakly-Supervised Action Segmentation and Alignment via Transcript-Aware Union-of-Subspaces Learning
* Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning
* Webly Supervised Fine-Grained Recognition: Benchmark Datasets and an Approach
* What You Can Learn by Staring at a Blank Wall
* When do GANs replicate? On the choice of dataset size
* When Pigs Fly: Contextual Reasoning in Synthetic and Natural Scenes
* Where are you heading? Dynamic Trajectory Prediction with Expert Goal Examples
* Where2Act: From Pixels to Actions for Articulated 3D Objects
* Who's Waldo? Linking People Across Text and Images
* Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?
* With a Little Help from My Friends: Nearest-Neighbor Contrastive Learning of Visual Representations
* Worldsheet: Wrapping the World in a 3D Sheet for View Synthesis from a Single Image
* X-World: Accessibility, Vision, and Autonomy Meet
* XVFI: eXtreme Video Frame Interpolation
* You Don't Only Look Once: Constructing Spatial-Temporal Memory for Integrated 3D Object Detection and Tracking
* YouRefIt: Embodied Reference Understanding with Language and Gesture
* Z-Score Normalization, Hubness, and Few-Shot Learning
* Zen-NAS: A Zero-Shot NAS for High-Performance Image Recognition
* Zero-Shot Day-Night Domain Adaptation with a Physics Prior
* Zero-shot Natural Language Video Localization
* ZFlow: Gated Appearance Flow-based Virtual Try-on with 3D Priors
* 2D-3D Interlaced Transformer for Point Cloud Segmentation with Scene-Level Supervision
* 2D3D-MATR: 2D-3D Matching Transformer for Detection-free Registration between Images and Point Clouds
* 360VOT: A New Benchmark Dataset for Omnidirectional Visual Object Tracking
* 3D Distillation: Improving Self-Supervised Monocular Depth Estimation on Reflective Surfaces
* 3D Human Mesh Recovery with Sequentially Global Rotation Estimation
* 3D Implicit Transporter for Temporally Consistent Keypoint Discovery
* 3D Instance Segmentation via Enhanced Spatial and Semantic Supervision
* 3D Motion Magnification: Visualizing Subtle Motions with Time-Varying Radiance Fields
* 3D Neural Embedding Likelihood: Probabilistic Inverse Graphics for Robust 6D Pose Estimation
* 3D Segmentation of Humans in Point Clouds with Synthetic Data
* 3D Semantic Subspace Traverser: Empowering 3D Generative Model with Shape Editing Capability
* 3D VR Sketch Guided 3D Shape Prototyping and Exploration
* 3D-aware Blending with Generative NeRFs
* 3D-Aware Generative Model for Improved Side-View Image Synthesis
* 3D-aware Image Generation using 2D Diffusion Models
* 3D-Aware Neural Body Fitting for Occlusion Robust 3D Human Pose Estimation
* 3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment
* 3DHacker: Spectrum-based Decision Boundary Generation for Hard-label 3D Point Cloud Attack
* 3DHumanGAN: 3D-Aware Human Image Generation with 3D Pose Mapping
* 3DMiner: Discovering Shapes from Large-Scale Unannotated Image Datasets
* 3DMOTFormer: Graph Transformer for Online 3D Multi-Object Tracking
* 3DPPE: 3D Point Positional Encoding for Transformer-based Multi-Camera 3D Object Detection
* 4D Myocardium Reconstruction with Decoupled Motion and Shape Model
* 4D Panoptic Segmentation as Invariant and Equivariant Field Prediction
* 5-Point Minimal Solver for Event Camera Relative Motion Estimation, A
* A-STAR: Test-time Attention Segregation and Retention for Text-to-image Synthesis
* A2Q: Accumulator-Aware Quantization with Guaranteed Overflow Avoidance
* Ablating Concepts in Text-to-Image Diffusion Models
* AccFlow: Backward Accumulation for Long-Range Optical Flow
* Accurate 3D Face Reconstruction with Facial Component Tokens
* Accurate and Fast Compressed Video Captioning
* Achievement-based Training Progress Balancing for Multi-Task Learning
* ACLS: Adaptive and Conditional Label Smoothing for Network Calibration
* ActFormer: A GAN-based Transformer towards General Action-Conditioned 3D Human Motion Generation
* Action Sensitivity Learning for Temporal Action Localization
* Activate and Reject: Towards Safe Domain Generalization under Category Shift
* Active Neural Mapping
* Active Stereo Without Pattern Projector
* ACTIVE: Towards Highly Transferable 3D Physical Camouflage for Universal and Robust Vehicle Evasion
* ActorsNeRF: Animatable Few-shot Human Rendering with Generalizable NeRFs
* Ada3D: Exploiting the Spatial Redundancy with Adaptive Inference for Efficient 3D Object Detection
* AdaMV-MoE: Adaptive Multi-Task Vision Mixture-of-Experts
* AdaNIC: Towards Practical Neural Image Compression via Dynamic Transform Routing
* ADAPT: Efficient Multi-Agent Trajectory Prediction with Adaptation
* Adaptive and Background-Aware Vision Transformer for Real-Time UAV Tracking
* Adaptive Calibrator Ensemble: Navigating Test Set Difficulty in Out-of-Distribution Scenarios
* Adaptive Frequency Filters As Efficient Global Token Mixers
* Adaptive Illumination Mapping for Shadow Detection in Raw Images
* Adaptive Image Anonymization in the Context of Image Classification with Neural Networks
* Adaptive Model Ensemble Adversarial Attack for Boosting Adversarial Transferability, An
* Adaptive Nonlinear Latent Transformation for Conditional Face Editing
* Adaptive Positional Encoding for Bundle-Adjusting Neural Radiance Fields
* Adaptive Reordering Sampler with Neurally Guided MAGSAC
* Adaptive Rotated Convolution for Rotated Object Detection
* Adaptive Similarity Bootstrapping for Self-Distillation based Representation Learning
* Adaptive Spiral Layers for Efficient 3D Representation Learning on Meshes
* Adaptive Superpixel for Active Learning in Semantic Segmentation
* Adaptive Template Transformer for Mitochondria Segmentation in Electron Microscopy Images
* Adaptive Testing of Computer Vision Models
* Adding Conditional Control to Text-to-Image Diffusion Models
* ADNet: Lane Shape Prediction via Anchor Decomposition
* Advancing Example Exploitation Can Alleviate Critical Challenges in Adversarial Training
* Advancing Referring Expression Segmentation Beyond Single Image
* AdvDiffuser: Natural Adversarial Example Synthesis with Diffusion Models
* AdVerb: Visually Guided Audio Dereverberation
* Adversarial Bayesian Augmentation for Single-Source Domain Generalization
* Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff
* Adverse Weather Removal with Codebook Priors
* AerialVLN: Vision-and-Language Navigation for UAVs
* AesPA-Net: Aesthetic Pattern-Aware Style Transfer Networks
* Affective Image Filter: Reflecting Emotions from Text to Images
* Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection
* AffordPose: A Large-scale Dataset of Hand-Object Interactions with Affordance-driven Hand Pose
* AG3D: Learning to Generate 3D Avatars from 2D Image Collections
* AGG-Net: Attention Guided Gated-convolutional Network for Depth Image Completion
* Agglomerative Transformer for Human-Object Interaction Detection
* Aggregating Feature Point Cloud for Depth Completion
* Agile Modeling: From Concept to Classifier in Minutes
* AIDE: A Vision-Driven Multi-View, Multi-Modal, Multi-Tasking Dataset for Assistive Driving Perception
* Algebraically rigorous quaternion framework for the neural network pose estimation problem
* AlignDet: Aligning Pre-training and Fine-tuning in Object Detection
* Alignment Before Aggregation: Trajectory Memory Retrieval Network for Video Object Segmentation
* Alignment-free HDR Deghosting with Semantics Consistent Transformer
* ALIP: Adaptive Language-Image Pre-training with Synthetic Caption
* All in Tokens: Unifying Output Space of Visual Tasks via Soft Token
* All-to-key Attention for Arbitrary Style Transfer
* Alleviating Catastrophic Forgetting of Incremental Object Detection via Within-Class and Between-Class Knowledge Distillation
* ALWOD: Active Learning for Weakly-Supervised Object Detection
* Among Us: Adversarially Robust Collaborative Perception by Consensus
* Anatomical Invariance Modeling and Semantic Alignment for Self-supervised Learning in 3D Medical Image Analysis
* Anchor Structure Regularization Induced Multi-view Subspace Clustering via Enhanced Tensor Rank Minimization
* Anchor-Intermediate Detector: Decoupling and Coupling Bounding Boxes for Accurate Object Detection
* Animal3D: A Comprehensive Dataset of 3D Animal Pose and Shape
* Anomaly Detection under Distribution Shift
* Anomaly Detection using Score-based Perturbation Resilience
* Anti-DreamBooth: Protecting users from personalized text-to-image synthesis
* Aperture Diffraction for Compact Snapshot Spectral Imaging
* AREA: Adaptive Reweighting via Effective Area for Long-Tailed Classification
* Aria Digital Twin: A New Benchmark Dataset for Egocentric 3D Machine Perception
* ARNOLD: A Benchmark for Language-Grounded Task Learning With Continuous States in Realistic 3D Scenes
* ASAG: Building Strong One-Decoder-Layer Sparse Detectors via Adaptive Sparse Anchor Generation
* ASIC: Aligning Sparse in-the-wild Image Collections
* ASM: Adaptive Skinning Model for High-Quality 3D Face Modeling
* AssetField: Assets Mining and Reconfiguration in Ground Feature Plane Representation
* Atmospheric Transmission and Thermal Inertia Induced Blind Road Segmentation with a Large-Scale Dataset TBRSD
* ATT3D: Amortized Text-to-3D Object Synthesis
* Attention Discriminant Sampling for Point Clouds
* Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration
* Attentive Mask CLIP
* AttT2M: Text-Driven Human Motion Generation with Multi-Perspective Attention Mechanism
* Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment
* Audio-Visual Class-Incremental Learning
* Audio-Visual Deception Detection: DOLOS Dataset and Parameter-Efficient Crossmodal Learning
* Audio-Visual Glance Network for Efficient Video Recognition
* Audiovisual Masked Autoencoders
* Augmented Box Replay: Overcoming Foreground Shift for Incremental Object Detection
* Augmenting and Aligning Snippets for Few-Shot Video Domain Adaptation
* AutoAD II: The Sequel - Who, When, and What in Movie Audio Description
* AutoDiffusion: Training-Free Optimization of Time Steps and Architectures for Automated Diffusion Model Acceleration
* Automated Knowledge Distillation via Monte Carlo Tree Search
* Automatic Animation of Hair Blowing in Still Portrait Photos
* Automatic Network Pruning via Hilbert-Schmidt Independence Criterion Lasso under Information Bottleneck Principle
* AutoReP: Automatic ReLU Replacement for Fast Private Network Inference
* AutoSynth: Learning to Generate 3D Training Data for Object Point Cloud Registration
* Auxiliary Tasks Benefit 3D Skeleton-based Human Motion Prediction
* AvatarCraft: Transforming Text into Neural Human Avatars with Parameterized Shape and Pose Control
* Backpropagation Path Search On Adversarial Transferability
* BallGAN: 3D-aware Image Synthesis with a Spherical Background
* BANSAC: A dynamic BAyesian Network for adaptive SAmple Consensus
* BaRe-ESA: A Riemannian Framework for Unregistered Human Body Shapes
* Batch-based Model Registration for Fast 3D Sherd Reconstruction
* Bayesian Optimization Meets Self-Distillation
* Bayesian Prompt Learning for Image-Language Model Generalization
* Be Everywhere - Hear Everything (BEE): Audio Scene Reconstruction by Sparse Audio-Visual Samples
* Beating Backdoor Attack at Its Own Game
* BeLFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction
* Benchmark for Chinese-English Scene Text Image Super-resolution, A
* Benchmarking Algorithmic Bias in Face Recognition: An Experimental Approach Using Synthetic Faces and Human Evaluation
* Benchmarking and Analyzing Robust Point Cloud Recognition: Bag of Tricks for Defending Adversarial Examples
* Benchmarking Low-Shot Robustness to Natural Distribution Shifts
* Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation
* Better May Not Be Fairer: A Study on Subgroup Discrepancy in Image Classification
* BEV-DG: Cross-Modal Learning under Bird's-Eye View for Domain Generalization of 3D Semantic Segmentation
* BEVPlace: Learning LiDAR-based Place Recognition using Bird's Eye View Images
* Beyond Image Borders: Learning Feature Extrapolation for Unbounded Image Composition
* Beyond Object Recognition: A New Benchmark towards Object Concept Learning
* Beyond One-to-One: Rethinking the Referring Image Segmentation
* Beyond Single Path Integrated Gradients for Reliable Input Attribution via Randomized Path Sampling
* Beyond Skin Tone: A Multidimensional Measure of Apparent Skin Color
* Beyond the limitation of monocular 3D detector via knowledge distillation
* Beyond the Pixel: a Photometrically Calibrated HDR Dataset for Luminance and Color Prediction
* Bidirectional Alignment for Domain Adaptive Detection with Transformers
* Bidirectionally Deformable Motion Modulation For Video-based Human Pose Transfer
* BiFF: Bi-level Future Fusion with Polyline-based Coordinate for Interactive Trajectory Prediction
* Bird's-Eye-View Scene Graph for Vision-Language Navigation
* BiViT: Extremely Compressed Binary Vision Transformers
* Black Box Few-Shot Adaptation for Vision-Language models
* Black-box Unsupervised Domain Adaptation with Bi-directional Atkinson-Shiffrin Memory
* BlendFace: Re-designing Identity Encoders for Face-Swapping
* Blending-NeRF: Text-Driven Localized Editing in Neural Radiance Fields
* BlindHarmony: Blind Harmonization for MR Images via Flow model
* Body Knowledge and Uncertainty Modeling for Monocular 3D Human Body Reconstruction
* Bold but Cautious: Unlocking the Potential of Personalized Federated Learning through Cautiously Aggressive Collaboration
* BoMD: Bag of Multi-label Descriptors for Noisy Chest X-ray Classification
* Boosting 3-DoF Ground-to-Satellite Camera Localization Accuracy via Geometry-Guided Cross-View Transformer
* Boosting Adversarial Transferability via Gradient Relevance Attack
* Boosting Few-shot Action Recognition with Graph-guided Hybrid Matching
* Boosting Long-tailed Object Detection via Step-wise Learning on Smooth-tail Data
* Boosting Multi-modal Model Performance with Adaptive Gradient Modulation
* Boosting Novel Category Discovery Over Domains with Soft Contrastive Learning and All in One Classifier
* Boosting Positive Segments for Weakly-Supervised Audio-Visual Video Parsing
* Boosting Semantic Segmentation from the Perspective of Explicit Class Embeddings
* Boosting Single Image Super-Resolution via Partial Channel Shifting
* Boosting Whole Slide Image Classification from the Perspectives of Distribution, Correlation and Magnification
* Bootstrap Motion Forecasting With Self-Consistent Constraints
* Borrowing Knowledge From Pre-trained Language Model: A New Data-efficient Visual Learning Paradigm
* Both Diverse and Realism Matter: Physical Attribute and Style Alignment for Rainy Image Generation
* Boundary-Aware Divide and Conquer: A Diffusion-based Solution for Unsupervised Shadow Removal
* Box-Based Refinement for Weakly Supervised and Unsupervised Localization Tasks
* BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion
* BoxSnake: Polygonal Instance Segmentation with Box Supervision
* Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images
* Breaking Temporal Consistency: Generating Video Universal Adversarial Perturbations Using Image Models
* Breaking The Limits of Text-conditioned 3D Motion Synthesis with Elaborative Descriptions
* Bridging Cross-task Protocol Inconsistency for Distillation in Dense Object Detection
* Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation
* Bring Clipart to Life
* BT2: Backward-compatible Training with Basis Transformation
* Building a Winning Team: Selecting Source Model Ensembles using a Submodular Transferability Estimation Approach
* Building Bridge Across the Time: Disruption and Restoration of Murals In the Wild
* Building Vision Transformers with Hierarchy Aware Feature Aggregation
* Building3D: An Urban-Scale Dataset and Benchmarks for Learning Roof Structures from Point Clouds
* BUS: Efficient and Effective Vision-language Pre-training with Bottom-Up Patch Summarization
* C2F2NeUS: Cascade Cost Frustum Fusion for High Fidelity and Generalizable Neural Surface Reconstruction
* C2ST: Cross-modal Contextualized Sequence Transduction for Continuous Sign Language Recognition
* CAD-Estate: Large-scale CAD Model Annotation in RGB Videos
* CAFA: Class-Aware Feature Alignment for Test-Time Adaptation
* Calibrating Panoramic Depth Estimation for Practical Localization and Mapping
* Calibrating Uncertainty for Semi-Supervised Crowd Counting
* Camera-Driven Representation Learning for Unsupervised Domain Adaptive Person Re-identification
* Can Language Models Learn to Listen?
* CancerUniT: Towards a Single Unified Model for Effective Detection, Segmentation, and Diagnosis of Eight Major Cancers Using a Large Collection of CT Scans
* Candidate-Aware Selective Disambiguation Based on Normalized Entropy for Instance-Dependent Partial-Label Learning
* Canonical Factors for Hybrid Neural Fields
* CaPhy: Capturing Physical Properties for Animatable Human Avatars
* Cascade-DETR: Delving into High-Quality Universal Object Detection
* CASSPR: Cross Attention Single Scan Place Recognition
* Category-aware Allocation Transformer for Weakly Supervised Object Localization
* Causal-DFQ: Causality Guided Data-free Network Quantization
* CauSSL: Causality-Inspired Semi-Supervised Learning for Medical Image Segmentation
* CBA: Improving Online Continual Learning via Continual Bias Adaptor
* CC3D: Layout-Conditioned Generation of Compositional 3D Scenes
* CDAC: Cross-domain Attention Consistency in Transformer for Domain Adaptive Semantic Segmentation
* CDUL: CLIP-Driven Unsupervised Learning for Multi-Label Image Classification
* Center-Based Decoupled Point Cloud Registration for 6D Object Pose Estimation
* CFCG: Semi-Supervised Semantic Segmentation via Cross-Fusion and Contour Guidance Supervision
* CGBA: Curvature-aware Geometric Black-box Attack
* Champagne: Learning Real-world Conversation from Large-Scale Web Videos
* Chaotic World: A Large and Challenging Benchmark for Human Behavior Understanding in Chaotic Events
* ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules
* Chasing clouds: Differentiable volumetric rasterisation of point clouds as a highly efficient and accurate loss for large-scale deformable 3D registration
* CheckerPose: Progressive Dense Keypoint Localization for Object Pose Estimation with Graph Neural Network
* ChildPlay: A New Benchmark for Understanding Children's Gaze Behaviour
* Chinese Text Recognition with A Pre-Trained CLIP-Like Model Through Image-IDS Aligning
* Chop & Learn: Recognizing and Generating Object-State Compositions
* Chord: Category-level Hand-held Object Reconstruction via Shape Deformation
* Chordal Averaging on Flag Manifolds and Its Applications
* CHORUS: Learning Canonicalized 3D Human-Object Spatial Relations from Unbounded Synthesized Images
* Chupa: Carving 3D Clothed Humans from Skinned Shape Priors using 2D Diffusion Probabilistic Models
* CIRI: Curricular Inactivation for Residue-aware One-shot Video Inpainting
* CiT: Curation in Training for Effective Vision-Language Data
* CiteTracker: Correlating Image and Text for Visual Tracking
* CL-MVSNet: Unsupervised Multi-view Stereo with Dual-level Contrastive Learning
* Class Prior-Free Positive-Unlabeled Learning with Taylor Variational Loss for Hyperspectral Remote Sensing Imagery
* Class-Aware Patch Embedding Adaptation for Few-Shot Image Classification
* Class-incremental Continual Learning for Instance Segmentation with Image-level Weak Supervision
* Class-Incremental Grouping Network for Continual Audio-Visual Learning
* Class-relation Knowledge Distillation for Novel Class Discovery
* CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning
* ClimateNeRF: Extreme Weather Synthesis in Neural Radiance Field
* CLIP-Cluster: CLIP-Guided Attribute Hallucination for Face Clustering
* CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection
* CLIP2Point: Transfer CLIP to Point Cloud Classification with Image-Depth Pre-Training
* CLIPascene: Scene Sketching with Different Types and Levels of Abstraction
* CLIPN for Zero-Shot OOD Detection: Teaching CLIP to Say No
* CLIPTER: Looking at the Bigger Picture in Scene Text Recognition
* CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation
* CLNeRF: Continual Learning Meets NeRF
* Cloth2Body: Generating 3D Human Body Mesh from 2D Clothing
* ClothesNet: An Information-Rich 3D Garment Model Repository with Simulated Clothes Environment
* ClothPose: A Real-world Benchmark for Visual Analysis of Garment Pose via An Indirect Recording Solution
* CLR: Channel-wise Lightweight Reprogramming for Continual Learning
* ClusT3: Information Invariant Test-Time Training
* Clusterformer: Cluster-based Transformer for 3D Object Detection in Point Clouds
* Clustering based Point Cloud Representation Learning for 3D Analysis
* Clutter Detection and Removal in 3D Scenes with View-Consistent Inpainting
* CMDA: Cross-Modality Domain Adaptation for Nighttime Semantic Segmentation
* Co-Evolution of Pose and Mesh for 3D Human Body Estimation from Video
* CO-Net: Learning Multiple Point Cloud Tasks at Once with A Cohesive Network
* CO-PILOT: Dynamic Top-Down Point Cloud with Conditional Neighborhood Aggregation for Multi-Gigapixel Histopathology Image Representation
* Coarse-to-Fine Amodal Segmentation with Shape Prior
* Coarse-to-Fine: Learning Compact Discriminative Representation for Single-Stage Image Retrieval
* COCO-O: A Benchmark for Object Detectors under Natural Distribution Shifts
* Coherent Event Guided Low-Light Video Enhancement
* CoIn: Contrastive Instance Feature Mining for Outdoor 3D Object Detection with Very Limited Annotations
* CoinSeg: Contrast Inter- and Intra- Class Representations for Incremental Segmentation
* Collaborative Propagation on Multiple Instance Graphs for 3D Instance Segmentation with Single-point Supervision
* Collaborative Tracking Learning for Frame-Rate-Insensitive Multi-Object Tracking
* Collecting The Puzzle Pieces: Disentangled Self-Driven Human Pose Transfer by Permuting Textures
* Combating Noisy Labels with Sample Selection by Mining High-Discrepancy Examples
* Communication-efficient Federated Learning with Single-Step Synthetic Features Compressor for Faster Convergence
* Communication-Efficient Vertical Federated Learning with Limited Overlapping Samples
* COMPASS: High-Efficiency Deep Image Compression with Arbitrary-scale Spatial Scalability
* Compatibility of Fundamental Matrices for Complete Viewing Graphs
* Complementary Domain Adaptation and Generalization for Unsupervised Continual Domain Shift Learning
* Complete Recipe for Diffusion Generative Models, A
* Compositional Feature Augmentation for Unbiased Scene Graph Generation
* Computation and Data Efficient Backdoor Attacks
* Computational 3D Imaging with Position Sensors
* Computationally-Efficient Neural Image Compression with Shallow Decoders
* Concept-wise Fine-tuning Matters in Preventing Negative Transfer
* Conceptual and Hierarchical Latent Space Decomposition for Face Editing
* Conditional 360-degree Image Synthesis for Immersive Indoor Scene Decoration
* Confidence-aware Pseudo-label Learning for Weakly Supervised Visual Grounding
* Confidence-based Visual Dispersal for Few-shot Unsupervised Domain Adaptation
* Consistent Depth Prediction for Transparent Object Reconstruction from RGB-D Camera
* ConSlide: Asynchronous Hierarchical Interaction Transformer with Breakup-Reorganize Rehearsal for Continual Whole Slide Image Analysis
* Constraining Depth Map Geometry for Multi-View Stereo: A Dual-Depth Approach with Saddle-shaped Depth Cells
* ContactGen: Generative Contact Modeling for Grasp Generation
* Contactless Pulse Estimation Leveraging Pseudo Labels and Self-Supervision
* Content-Aware Local GAN for Photo-Realistic Super-Resolution
* Context-Aware Planning and Environment-Aware Memory for Instruction Following Embodied Agents
* Continual Learning for Personalized Co-Speech Gesture Generation
* Continual Segment: Towards a Single, Unified and Non-forgetting Continual Segmentation Model of 143 Whole-body Organs in CT Scans
* Continual Zero-Shot Learning through Semantically Guided Generative Random Walks
* Continuously Masked Transformer for Image Inpainting
* Contrastive Continuity on Augmentation Stability Rehearsal for Continual Self-Supervised Learning
* Contrastive Feature Masking Open-Vocabulary Vision Transformer
* Contrastive Model Adaptation for Cross-Condition Robustness in Semantic Segmentation
* Contrastive Pseudo Learning for Open-World DeepFake Attribution
* Controllable Guide-Space for Generalizable Face Forgery Detection
* Controllable Person Image Synthesis with Pose-Constrained Latent Diffusion
* Controllable Visual-Tactile Synthesis
* Convex Decomposition of Indoor Scenes
* Convolutional Networks with Oriented 1D Kernels
* COOL-CHIC: Coordinate-based Low Complexity Hierarchical Image Codec
* COOP: Decoupling and Coupling of Whole-Body Grasping Pose Generation
* Coordinate Quantized Neural Implicit Representations for Multi-view Reconstruction
* Coordinate Transformer: Achieving Single-stage Multi-person Mesh Recovery from Videos
* COPILOT: Human-Environment Collision Prediction and Localization from Egocentric Videos
* CopyRNeRF: Protecting the CopyRight of Neural Radiance Fields
* CORE: Co-planarity Regularized Monocular Geometry Estimation with Weak Supervision
* Core: Cooperative Reconstruction for Multi-Agent Perception
* Corrupting Neuron Explanations of Deep Visual Features
* CoSign: Exploring Co-occurrence Signals in Skeleton-based Continuous Sign Language Recognition
* CoTDet: Affordance Knowledge Prompting for Task Driven Object Detection
* Counterfactual-based Saliency Map: Towards Visual Contrastive Explanations for Neural Networks
* Counting Crowds in Bad Weather
* CPCM: Contextual Point Cloud Modeling for Weakly-supervised Point Cloud Semantic Segmentation
* Creative Birds: Self-Supervised Single-View 3D Style Transfer
* Creative Birds: Self-Supervised Single-View 3D Style Transfer
* CRN: Camera Radar Net for Accurate, Robust, Efficient 3D Perception
* CroCo v2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical Flow
* Cross Contrasting Feature Perturbation for Domain Generalization
* Cross Modal Transformer: Towards Fast and Robust 3D Object Detection
* Cross-Domain Product Representation Learning for Rich-Content E-Commerce
* Cross-modal Latent Space Alignment for Image to Avatar Translation
* Cross-Modal Learning with 3D Deformable Attention for Action Recognition
* Cross-Modal Orthogonal High-Rank Augmentation for RGB-Event Transformer-trackers
* Cross-modal Scalable Hyperbolic Hierarchical Clustering
* Cross-Modal Translation and Alignment for Survival Analysis
* Cross-Ray Neural Radiance Fields for Novel-view Synthesis from Unconstrained Image Collections
* Cross-view Semantic Alignment for Livestreaming Product Recognition
* Cross-view Topology Based Consistent and Complementary Information for Deep Multi-view Clustering
* CROSSFIRE: Camera Relocalization On Self-Supervised Features from an Implicit Representation
* CrossLoc3D: Aerial-Ground Cross-Source 3D Place Recognition
* CrossMatch: Source-Free Domain Adaptive Semantic Segmentation via Cross-Modal Consistency Training
* CSDA: Learning Category-Scale Joint Feature for Domain Adaptive Object Detection
* CTP: Towards Vision-Language Continual Pretraining via Compatible Momentum Contrast and Topology Preservation
* CTVIS: Consistent Training for Online Video Instance Segmentation
* Cumulative Spatial Knowledge Distillation for Vision Transformers
* CuNeRF: Cube-Based Neural Radiance Field for Zero-Shot Medical Image Arbitrary-Scale Super Resolution
* Curvature-Aware Training for Coordinate Networks
* CVRecon: Rethinking 3D Geometric Feature Learning For Neural Reconstruction
* CVSformer: Cross-View Synthesis Transformer for Semantic Scene Completion
* Cyclic Test-Time Adaptation on Monocular Video for 3D Human Mesh Reconstruction
* Cyclic-Bootstrap Labeling for Weakly Supervised Object Detection
* D-IF: Uncertainty-aware Human Digitization via Implicit Distribution Field
* D3G: Exploring Gaussian Prior for Temporal Sentence Grounding with Glance Annotation
* DALL-EVAL: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models
* Dancing in the Dark: A Benchmark towards General Low-light Video Enhancement
* DandelionNet: Domain Composition with Instance Adaptive Classification for Domain Generalization
* Dark Side Augmentation: Generating Diverse Night Examples for Metric Learning
* DarSwin: Distortion Aware Radial Swin Transformer
* DARTH: Holistic Test-time Adaptation for Multiple Object Tracking
* Data Augmented Flatness-aware Gradient Projection for Continual Learning
* Data-Free Class-Incremental Hand Gesture Recognition
* Data-free Knowledge Distillation for Fine-grained Visual Categorization
* DataDAM: Efficient Dataset Distillation with Attention Matching
* Dataset Quantization
* DCPB: Deformable Convolution based on the Poincaré Ball for Top-view Fisheye Cameras
* DDColor: Towards Photo-Realistic Image Colorization via Dual Decoders
* DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion
* DDG-Net: Discriminability-Driven Graph Network for Weakly-supervised Temporal Action Localization
* DDIT: Semantic Scene Completion via Deformable Deep Implicit Templates
* DDP: Diffusion Model for Dense Visual Prediction
* DDS2M: Self-Supervised Denoising Diffusion Spatio-Spectral Model for Hyperspectral Image Restoration
* Dec-Adapter: Exploring Efficient Decoder-Side Adapter for Bridging Screen Content and Natural Image Compression
* DECO: Dense Estimation of 3D Human-Scene Contact In The Wild
* Decomposition-Based Variational Network for Multi-Contrast MRI Super-Resolution and Reconstruction
* Decouple Before Interact: Multi-Modal Prompt Learning for Continual Visual Question Answering
* Decoupled DETR: Spatially Disentangling Localization and Classification for Improved End-to-End Object Detection
* Decoupled Iterative Refinement Framework for Interacting Hands Reconstruction from a Single RGB Image
* DeDrift: Robust Similarity Search under Content Drift
* Deep Active Contours for Real-time 6-DoF Object Tracking
* Deep Directly-Trained Spiking Neural Networks for Object Detection
* Deep Equilibrium Object Detection
* Deep Feature Deblurring Diffusion for Detecting Out-of-Distribution Objects
* Deep Fusion Transformer Network with Weighted Vector-Wise Keypoints Voting for Robust 6D Object Pose Estimation
* Deep Geometrized Cartoon Line Inbetweening
* Deep geometry-aware camera self-calibration from video
* Deep Homography Mixture for Single Image Rolling Shutter Correction
* Deep Image Harmonization with Globally Guided Feature Transformation and Relation Distillation
* Deep Image Harmonization with Learnable Augmentation
* Deep Incubation: Training Large Models by Divide-and-Conquering
* Deep Multitask Learning with Progressive Parameter Sharing
* Deep Multiview Clustering by Contrasting Cluster Assignments
* Deep Optics for Video Snapshot Compressive Imaging
* Deep Video Demoiréing via Compact Invertible Dyadic Decomposition
* DeepChange: A Long-Term Person Re-Identification Benchmark with Clothes Change
* DeePoint: Visual Pointing Recognition and Direction Estimation
* Deformable Model-Driven Neural Rendering for High-Fidelity 3D Reconstruction of Human Heads Under Low-View Settings
* Deformable Neural Radiance Fields using RGB and Event Cameras
* Deformer: Dynamic Fusion Transformer for Robust Hand Pose Estimation
* DeFormer: Integrating Transformers with Deformable Models for 3D Shape Abstraction from a Single Image
* DeformToon3d: Deformable Neural Radiance Fields for 3D Toonification
* Degradation-Resistant Unfolding Network for Heterogeneous Image Fusion
* DELFlow: Dense Efficient Learning of Scene Flow for Large-Scale Point Clouds
* Delicate Textured Mesh Recovery from NeRF via Adaptive Surface Refinement
* DeLiRa: Self-Supervised Depth, Light, and Radiance Fields
* Delta Denoising Score
* Delving into Motion-Aware Matching for Monocular 3D Object Tracking
* Democratising 2D Sketch to 3D Shape Retrieval Through Pivoting
* Denoising Diffusion Autoencoders are Unified Self-supervised Learners
* Dense 2D-3D Indoor Prediction with Sound via Aligned Cross-Modal Distillation
* Dense Text-to-Image Generation with Attention Modulation
* DenseShift: Towards Accurate and Efficient Low-Bit Power-of-Two Quantization
* Density-invariant Features for Distant Point Cloud Registration
* Designing Phase Masks for Under-Display Cameras
* DETA: Denoised Task Adaptation for Few-Shot Learning
* Detecting Objects with Context-Likelihood Graphs and Graph Refinement
* Detection Transformer with Stable Matching
* DetermiNet: A Large-Scale Diagnostic Dataset for Complex Visually-Grounded Referencing using Determiners
* DETR Does Not Need Multi-Scale or Locality Design
* DETRDistill: A Universal Knowledge Distillation Framework for DETR-families
* DETRs with Collaborative Hybrid Assignments Training
* DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point Clouds
* Devil is in the Crack Orientation: A New Perspective for Crack Detection, The
* Devil is in the Upsampling: Architectural Decisions Made Simpler for Denoising with Deep Image Prior, The
* DFA3D: 3D Deformable Attention For 2D-to-3D Feature Lifting
* DG-Recon: Depth-Guided Neural 3D Scene Reconstruction
* DG3D: Generating High Quality 3D Textured Shapes by Learning to Discriminate Multi-Modal Diffusion-Renderings
* DiFaReli: Diffusion Face Relighting
* Diff-Retinex: Rethinking Low-light Image Enhancement with A Generative Diffusion Model
* DiffCloth: Diffusion Based Garment Synthesis and Manipulation via Structural Cross-modal Semantic Alignment
* DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability
* DiffDreamer: Towards Consistent Unsupervised Single-view Scene Extrapolation with Conditional Diffusion Models
* Differentiable Transportation Pruning
* DiffFacto: Controllable Part-Based 3D Point Cloud Generation with Cross Diffusion
* DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning
* DiffGuard: Semantic Mismatch-Guided Out-of-Distribution Detection using Pre-trained Diffusion Models
* DiffIR: Efficient Diffusion Model for Image Restoration
* DiffPose: Multi-hypothesis Human Pose Estimation using Diffusion Models
* DiffPose: SpatioTemporal Diffusion Model for Video-Based Human Pose Estimation
* DiffRate: Differentiable Compression Rate for Efficient Vision Transformers
* DiffTAD: Temporal Action Detection with Proposal Denoising Diffusion
* DiffuMask: Synthesizing Images with Pixel-Level Annotations for Semantic Segmentation Using Diffusion Models
* Diffuse3D: Wide-Angle 3D Photography via Bilateral Diffusion
* Diffusion Action Segmentation
* Diffusion in Style
* Diffusion Model as Representation Learner
* Diffusion Models as Masked Autoencoders
* Diffusion-Based 3D Human Pose Estimation with Multi-Hypothesis Aggregation
* Diffusion-based Image Translation with Label Guidance for Domain Adaptive Semantic Segmentation
* Diffusion-Guided Reconstruction of Everyday Hand-Object Interaction Clips
* Diffusion-SDF: Conditional Generative Modeling of Signed Distance Functions
* DiffusionDet: Diffusion Model for Object Detection
* DiffusionRet: Generative Text-Video Retrieval with Diffusion Model
* DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding
* DiLiGenT-Pi: Photometric Stereo for Planar Surfaces with Rich Details - Benchmark Dataset and Beyond
* DIME-FM: DIstilling Multimodal and Efficient Foundation Models
* DINAR: Diffusion Inpainting of Neural Textures for One-Shot Human Avatars
* DIRE for Diffusion-Generated Image Detection
* Discovering Spatio-Temporal Rationales for Video Question Answering
* Discrepant and Multi-instance Proxies for Unsupervised Person Re-identification
* Discriminative Class Tokens for Text-to-Image Diffusion Models
* Disentangle then Parse: Night-time Semantic Segmentation with Illumination Disentanglement
* Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning
* DISeR: Designing Imaging Systems with Reinforcement Learning
* Disposable Transfer Learning for Selective Source Task Unlearning
* DistillBEV: Boosting Multi-Camera 3D Object Detection with Cross-Modal Knowledge Distillation
* Distilled Reverse Attention Network for Open-world Compositional Zero-Shot Learning
* Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding
* Distilling DETR with Visual-Linguistic Knowledge for Open-Vocabulary Object Detection
* Distilling from Similar Tasks for Transfer Learning on a Budget
* Distilling Large Vision-Language Model with Out-of-Distribution Generalizability
* Distracting Downpour: Adversarial Weather Attacks for Motion Estimation
* Distributed bundle adjustment with block-based sparse matrix compression for super large scale datasets
* Distribution Shift Matters for Knowledge Distillation with Webly Collected Images
* Distribution-Aligned Diffusion for Human Mesh Recovery
* Distribution-Aware Prompt Tuning for Vision-Language Models
* Distribution-Consistent Modal Recovering for Incomplete Multimodal Learning
* Diverse Cotraining Makes Strong Semi-Supervised Segmentor
* Diverse Data Augmentation with Diffusions for Effective Test-time Prompt Tuning
* Diverse Inpainting and Editing with GAN Inversion
* Divide and Conquer: 3D Point Cloud Instance Segmentation With Point-Wise Binarization
* Divide and Conquer: a Two-Step Method for High Quality Face De-identification with Model Explainability
* Divide&Classify: Fine-Grained Classification for City-Wide Visual Place Recognition
* DLGSANet: Lightweight Dynamic Local and Global Self-Attention Network for Image Super-Resolution
* DLT: Conditioned layout generation with Joint Discrete-Continuous Diffusion Layout Transformer
* DMNet: Delaunay Meshing Network for 3D Shape Representation
* DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity Human-centric Rendering
* Do DALL-E and Flamingo Understand Each Other?
* DocTr: Document Transformer for Structured Information Extraction in Documents
* Document Understanding Dataset and Evaluation (DUDE)
* Does Physical Adversarial Example Really Matter to Autonomous Driving? Towards System-Level Effect of Adversarial Object Evasion Attack
* DOLCE: A Model-Based Probabilistic Diffusion Framework for Limited-Angle CT Reconstruction
* Domain Adaptive Few-Shot Open-Set Learning
* Domain Generalization Guided by Gradient Signal to Noise Ratio of Parameters
* Domain generalization of 3D semantic segmentation in autonomous driving
* Domain Generalization via Balancing Training Difficulty and Model Capability
* Domain Generalization via Rationale Invariance
* Domain Specified Optimization for Deployment Authorization
* Domain-Specificity Inducing Transformers for Source-Free Domain Adaptation
* DomainAdaptor: A Novel Approach to Test-time Adaptation
* DomainDrop: Suppressing Domain-Sensitive Channels for Domain Generalization
* Doppelgangers: Learning to Disambiguate Images of Similar Structures
* DOT: A Distillation-Oriented Trainer
* Downscaled Representation Matters: Improving Image Rescaling with Collaborative Downscaled Images
* Downstream-agnostic Adversarial Examples
* DPF-Net: Combining Explicit Shape Priors in Deformable Primitive Field for Unsupervised Structural Reconstruction of 3D Objects
* DPM-OT: A New Diffusion Probabilistic Model Based on Optimal Transport
* DPS-Net: Deep Polarimetric Stereo Depth Estimation
* DQS3D: Densely-matched Quantization-aware Semi-supervised 3D Detection
* DR-Tune: Improving Fine-tuning of Pretrained Visual Models by Distribution Regularization with Semantic Calibration
* DRAW: Defending Camera-shooted RAW against Image Manipulation
* DREAM: Efficient Dataset Distillation by Representative Matching
* DreamBooth3D: Subject-Driven Text-to-3D Generation
* DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion
* DreamTeacher: Pretraining Image Backbones with Deep Generative Models
* Dreamwalker: Mental Planning for Continuous Vision-Language Navigation
* DReg-NeRF: Deep Registration for Neural Radiance Fields
* DriveAdapter: Breaking the Coupling Barrier of Perception and Planning in End-to-End Autonomous Driving
* DS-Fusion: Artistic Typography via Discriminated and Stylized Diffusion
* Dual Aggregation Transformer for Image Super-Resolution
* Dual Learning with Dynamic Knowledge Distillation for Partially Relevant Video Retrieval
* Dual Meta-Learning with Longitudinally Generalized Regularization for One-Shot Brain Tissue Segmentation Across the Human Lifespan
* Dual Pseudo-Labels Interactive Self-Training for Semi-Supervised Visible-Infrared Person Re-Identification
* DVGaze: Dual-View Gaze Estimation
* DVIS: Decoupled Video Instance Segmentation Framework
* DyGait: Exploiting Dynamic Representations for High-performance Gait Recognition
* Dynamic Dual-Processing Object Detection Framework Inspired by the Brain's Recognition Mechanism, A
* Dynamic Hyperbolic Attention Network for Fine Hand-object Reconstruction
* Dynamic Mesh Recovery from Partial Point Cloud Sequence
* Dynamic Mesh-Aware Radiance Fields
* Dynamic Perceiver for Efficient Visual Recognition
* Dynamic PlenOctree for Adaptive Sampling Refinement in Explicit NeRF
* Dynamic Point Fields
* Dynamic Residual Classifier for Class Incremental Learning
* Dynamic Snake Convolution based on Topological Geometric Constraints for Tubular Structure Segmentation
* Dynamic Token Pruning in Plain Vision Transformers for Semantic Segmentation
* DynamicISP: Dynamically Controlled Image Signal Processor for Image Recognition
* DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive Segmentation Transformer
* E2E-LOAD: End-to-End Long-form Online Action Detection
* E2NeRF: Event Enhanced Neural Radiance Fields from Blurry Images
* E2VPT: An Effective and Efficient Approach for Visual Prompt Tuning
* E3Sym: Leveraging E(3) Invariance for Unsupervised 3D Planar Reflective Symmetry Detection
* EdaDet: Open-Vocabulary Object Detection Using Early Dense Alignment
* EDAPS: Enhanced Domain-Adaptive Panoptic Segmentation
* Editable Image Geometric Abstraction via Neural Primitive Assembly
* Editing Implicit Assumptions in Text-to-Image Diffusion Models
* Effective Real Image Editing with Accelerated Iterative Diffusion Inversion
* effectiveness of MAE pre-pretraining for billion-scale pretraining, The
* Efficient 3D Semantic Segmentation with Superpoint Transformer
* Efficient Adaptive Human-Object Interaction Detection with Concept-guided Memory
* Efficient Computation Sharing for Multi-Task Visual Scene Understanding
* Efficient Controllable Multi-Task Architectures
* Efficient Converted Spiking Neural Network for 3D and 2D Classification
* Efficient Decision-based Black-box Patch Attacks on Video Recognition
* Efficient Deep Space Filling Curve
* Efficient Diffusion Training via Min-SNR Weighting Strategy
* Efficient Discovery and Effective Evaluation of Visual Perceptual Similarity: A Benchmark and Beyond
* Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation
* Efficient Joint Optimization of Layer-Adaptive Weight Pruning in Deep Neural Networks
* Efficient LiDAR Point Cloud Oversegmentation Network
* Efficient Model Personalization in Federated Learning via Client-Specific Prompt Generation
* Efficient neural supersampling on a novel gaming dataset
* Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis
* Efficient Transformer-Based 3D Object Detection with Dynamic Token Halting
* Efficient Unified Demosaicing for Bayer and Non-Bayer Patterned Image Sensors
* Efficient Video Action Detection with Token Dropout and Context Refinement
* Efficient Video Prediction via Sparsely Conditioned Flow Matching
* Efficient View Synthesis with Neural Radiance Distribution Field
* Efficient-VQGAN: Towards High-Resolution Image Generation with Efficient Vision Transformers
* Efficiently Robustify Pre-Trained Models
* EfficientTrain: Exploring Generalized Curriculum Learning for Training Visual Backbones
* EfficientViT: Lightweight Multi-Scale Attention for High-Resolution Dense Prediction
* EGC: Image Generation and Classification via a Diffusion Energy-Based Model
* EGformer: Equirectangular Geometry-biased Transformer for 360 Depth Estimation
* Ego-Only: Egocentric Action Detection without Exocentric Transferring
* EgoHumans: An Egocentric 3D Multi-Human Benchmark
* EgoLoc: Revisiting 3D Object Localization from Egocentric Videos with Visual Queries
* EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding
* EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding
* EgoTV: Egocentric Task Verification from Natural Language Task Descriptions
* EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone
* EigenPlaces: Training Viewpoint Robust Models for Visual Place Recognition
* EigenTrajectory: Low-Rank Descriptors for Multi-Modal Trajectory Forecasting
* ElasticViT: Conflict-aware Supernet Training for Deploying Fast Vision Transformer on Diverse Mobile Devices
* ELFNet: Evidential Local-global Fusion for Stereo Matching
* ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation
* Embarrassingly Simple Backdoor Attack on Self-supervised Learning, An
* EMDB: The Electromagnetic Database of Global 3D Human Pose and Shape in the Wild
* EMMN: Emotional Motion Memory Network for Audio-driven Emotional Talking Face Generation
* EmoSet: A Large-scale Visual Emotion Dataset with Rich Attributes
* EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation
* Emotional Listener Portrait: Realistic Listener Motion Simulation in Conversation
* Empowering Low-Light Image Enhancer through Customized Learnable Priors
* EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization
* EMR-MSF: Self-Supervised Recurrent Monocular Scene Flow Exploiting Ego-Motion Rigidity
* Encyclopedic VQA: Visual questions about detailed properties of fine-grained categories
* End-to-end 3D Tracking with Decoupled Queries
* End-to-End Diffusion Latent Optimization Improves Classifier Guidance
* End2End Multi-View Feature Matching with Differentiable Pose Optimization
* Energy-based Self-Training and Normalization for Unsupervised Domain Adaptation
* Enhanced Meta Label Correction for Coping with Label Corruption
* Enhanced Soft Label for Semi-Supervised Semantic Segmentation
* Enhancing Adversarial Robustness in Low-Label Regime via Adaptively Weighted Regularization and Knowledge Distillation
* Enhancing Fine-Tuning based Backdoor Defense with Sharpness-Aware Minimization
* Enhancing Generalization of Universal Adversarial Perturbation through Gradient Aggregation
* Enhancing Modality-Agnostic Representations via Meta-learning for Brain Tumor Segmentation
* Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts
* Enhancing Non-line-of-sight Imaging via Learnable Inverse Kernel and Attention Mechanisms
* Enhancing Privacy Preservation in Federated Learning via Learning Rate Perturbation
* Enhancing Sample Utilization through Sample Adaptive Augmentation in Semi-Supervised Learning
* ENTL: Embodied Navigation Trajectory Learner
* ENVIDR: Implicit Differentiable Renderer with Neural Environment Lighting
* Environment Agnostic Representation for Visual Reinforcement learning
* Environment-Invariant Curriculum Relation Learning for Fine-Grained Scene Graph Generation
* eP-ALM: Efficient Perceptual Augmentation of Language Models
* EP2P-Loc: End-to-End 3D Point to 2D Pixel Localization for Large-Scale Visual Localization
* EPiC: Ensemble of Partial Point Clouds for Robust Classification
* EQ-Net: Elastic Quantization Neural Networks
* Equivariant Similarity for Vision-Language Foundation Models
* Erasing Concepts from Diffusion Models
* ESSAformer: Efficient Transformer for Hyperspectral Image Super-resolution
* Essential Matrix Estimation using Convex Relaxations in Orthogonal Space
* ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer
* Estimator Meets Equilibrium Perspective: A Rectified Straight Through Estimator for Binary Neural Networks Training
* ETran: Energy-Based Transferability Estimation
* Euclidean Space is Evil: Hyperbolic Attribute Editing for Few-shot Image Generation, The
* Eulerian Single-Photon Vision
* Evaluating Data Attribution for Text-to-Image Models
* Evaluation and Improvement of Interpretability for Self-Explainable Part-Prototype Networks
* Event Camera Data Pre-training
* Event-based Temporally Dense Optical Flow Estimation with Sequential Learning
* Event-Guided Procedure Planning from Instructional Videos with Text Supervision
* Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers
* EverLight: Indoor-Outdoor Editable HDR Lighting Estimation
* Examining Autoexposure for Challenging Scenes
* ExBluRF: Efficient Radiance Fields for Extreme Motion Blurred Images
* Exemplar-Free Continual Transformer with Convolutions
* Explaining Adversarial Robustness of Neural Networks from Clustering Effect Perspective
* Explicit Motion Disentangling for Efficient Optical Flow Estimation
* Exploiting Proximity-Aware Tasks for Embodied Social Navigation
* Explore and Tell: Embodied Visual Captioning in 3D Environments
* Exploring Group Video Captioning with Efficient Relational Approximation
* Exploring Lightweight Hierarchical Vision Transformers for Efficient Visual Tracking
* Exploring Model Transferability through the Lens of Potential Energy
* Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection
* Exploring Open-Vocabulary Semantic Segmentation from CLIP Vision Encoder Distillation Only
* Exploring Positional Characteristics of Dual-Pixel Data for Camera Autofocus
* Exploring Predicate Visual Context in Detecting of Human-Object Interactions
* Exploring Temporal Concurrency for Video-Language Representation Learning
* Exploring Temporal Frequency Spectrum in Deep Video Deblurring
* Exploring the Benefits of Visual Prompting in Differential Privacy
* Exploring the Sim2Real Gap using Digital Twins
* Exploring Transformers for Open-world Instance Segmentation
* Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives
* ExposureDiffusion: Learning to Expose for Low-light Image Enhancement
* Expressive Text-to-Image Generation with Rich Text
* Extensible and Efficient Proxy for Neural Architecture Search
* F&F Attack: Adversarial Attack against Multiple Object Trackers by Inducing False Negatives and False Positives
* Face Clustering via Graph Convolutional Networks with Confidence Edges
* FaceCLIPNeRF: Text-driven 3D Face Manipulation using Deformable Neural Radiance Fields
* FACET: Fairness in Computer Vision Evaluation Benchmark
* Factorized Inverse Path Tracing for Efficient and Accurate Material-Lighting Estimation
* FACTS: First Amplify Correlations and Then Slice to Discover Bias
* Fan-Beam Binarization Difference Projection (FB-BDP): A Novel Local Object Descriptor for Fine-Grained Leaf Image Retrieval
* Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation
* FashionNTM: Multi-turn Fashion Image Retrieval via Cascaded Memory
* Fast Adversarial Training with Smooth Convergence
* Fast and Accurate Transferability Measurement by Evaluating Intra-class Feature Variance
* Fast Full-frame Video Stabilization with Iterative Optimization
* Fast Globally Optimal Surface Normal from an Affine Correspondence
* Fast Inference and Update of Probabilistic Density Estimation on Trajectory Prediction
* Fast Neural Scene Flow
* Fast Unified System for 3D Object Detection and Tracking, A
* FastRecon: Few-shot Industrial Anomaly Detection via Fast Feature Reconstruction
* FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization
* FateZero: Fusing Attentions for Zero-shot Text-based Video Editing
* FB-BEV: BEV Representation from Forward-Backward View Transformations
* FBLNet: FeedBack Loop Network for Driver Attention Prediction
* Fcaformer: Forward Cross Attention in Hybrid Vision Transformer
* FDViT: Improve the Hierarchical Architecture of Vision Transformer
* FeatEnHancer: Enhancing Hierarchical Features for Object Detection and Beyond Under Low-Light Vision
* Feature Modulation Transformer: Cross-Refinement of Global Representation via High-Frequency Prior for Image Super-Resolution
* Feature Prediction Diffusion Model for Video Anomaly Detection
* Feature Proliferation: the Cancer in StyleGAN and its Treatments
* FeatureNeRF: Learning Generalizable NeRFs by Distilling Foundation Models
* Federated Learning Over Images: Vertical Decompositions and Pre-Trained Backbones Are Difficult to Beat
* FedPD: Federated Open Set Recognition with Parameter Disentanglement
* FedPerfix: Towards Partial Model Personalization of Vision Transformers in Federated Learning
* FemtoDet: An Object Detection Baseline for Energy Versus Performance Tradeoffs
* FerKD: Surgical Label Adaptation for Efficient Distillation
* Few shot font generation via transferring similarity guided global style and quantization local style
* Few-Shot Common Action Localization via Cross-Attentional Fusion of Context and Temporal Dynamics
* Few-shot Continual Infomax Learning
* Few-Shot Dataset Distillation via Translative Pre-Training
* Few-Shot Physically-Aware Articulated Mesh Generation via Hierarchical Deformation
* Few-Shot Video Classification via Representation Fusion and Promotion Learning
* Fg-T2M: Fine-Grained Text-Driven Human Motion Generation via Diffusion Model
* Fine-grained Unsupervised Domain Adaptation for Gait Recognition
* Fine-grained Visible Watermark Removal
* FineDance: A Fine-grained Choreography Dataset for 3D Full Body Dance Generation
* FineRecon: Depth-aware Feed-forward Network for Detailed 3D Reconstruction
* Fingerprinting Deep Image Restoration Models
* First Session Adaptation: A Strong Replay-Free Baseline for Class-Incremental Learning
* FishNet: A Large-scale Dataset and Benchmark for Fish Recognition, Detection, and Functional Trait Prediction
* Flatness-Aware Minimization for Domain Generalization
* FLatten Transformer: Vision Transformer using Focused Linear Attention
* Flexible Visual Recognition by Evidential Modeling of Confusion and Ignorance
* FLIP: Cross-domain Face Anti-spoofing with Language Guidance
* FlipNeRF: Flipped Reflection Rays for Few-shot Novel View Synthesis
* Focal Network for Image Restoration
* FocalFormer3D : Focusing on Hard Instance for 3D Object Detection
* Focus on Your Target: A Dual Teacher-Student Framework for Domain-adaptive Semantic Segmentation
* Focus the Discrepancy: Intra- and Inter-Correlation Learning for Image Anomaly Detection
* Forecast-MAE: Self-supervised Pre-training for Motion Forecasting with Masked Autoencoders
* Foreground and Text-lines Aware Document Image Rectification
* Foreground Object Search by Distilling Composite Image Feature
* Foreground-Background Distribution Modeling Transformer for Visual Object Tracking
* Foreground-Background Separation through Concept Distillation from Generative Image Foundation Models
* Forward Flow for Novel View Synthesis of Dynamic Scenes
* FPR: False Positive Rectification for Weakly Supervised Semantic Segmentation
* FRAug: Tackling Federated Learning with Non-IID Features via Representation Augmentation
* FreeCOS: Self-Supervised Learning from Fractals and Unlabeled Images for Curvilinear Object Segmentation
* FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model
* Frequency-aware GAN for Adversarial Manipulation Generation
* From Chaos Comes Order: Ordering Event Representations for Object Recognition and Detection
* From Knowledge Distillation to Self-Knowledge Distillation: A Unified Approach with Normalized Loss and Customized Soft Labels
* From Sky to the Ground: A Large-scale Benchmark and Simple Baseline Towards Real Rain Removal
* FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models
* FS-DETR: Few-Shot DEtection TRansformer with prompting and without re-training
* FSAR: Federated Skeleton-based Action Recognition with Adaptive Topology Structure and Knowledge Distillation
* FSI: Frequency and Spatial Interactive Learning for Image Restoration in Under-Display Cameras
* Full-Body Articulated Human-Object Interaction
* FULLER: Unified Multi-modality Multi-task 3D Perception via Multi-level Gradient Calibration
* Fully Attentional Networks with Self-emerging Token Labeling
* FunnyBirds: A Synthetic Vision Dataset for a Part-Based Analysis of Explainable AI Methods
* G2L: Semantically Aligned and Uniform Video Grounding via Geodesic and Game Theory
* GACE: Geometry Aware Confidence Enhancement for Black-box 3D Object Detectors on LiDAR-Data
* GAFlow: Incorporating Gaussian Attention into Optical Flow
* GAIT: Generating Aesthetic Indoor Tours with Deep Reinforcement Learning
* Game of Bundle Adjustment - Learning Efficient Convergence, A
* GameFormer: Game-theoretic Modeling and Learning of Transformer-based Interactive Prediction and Planning for Autonomous Driving
* GaPro: Box-Supervised 3D Point Cloud Instance Segmentation Using Gaussian Processes as Pseudo Labelers
* GasMono: Geometry-Aided Self-Supervised Monocular Depth Estimation for Indoor Scenes
* GECCO: Geometrically-Conditioned Point Diffusion Models
* GEDepth: Ground Embedding for Monocular Depth Estimation
* Gender Artifacts in Visual Datasets
* General Image-to-Image Translation with One-Shot Image Guidance
* General Planar Motion from a Pair of 3D Correspondences
* Generalist Framework for Panoptic Segmentation of Images and Videos, A
* Generalizable Decision Boundaries: Dualistic Meta-Learning for Open Set Domain Generalization
* Generalizable Neural Fields as Partially Observed Neural Processes
* Generalized Differentiable RANSAC
* Generalized Few-Shot Point Cloud Segmentation Via Geometric Words
* Generalized Lightness Adaptation with Channel Selective Normalization
* Generalized Sum Pooling for Metric Learning
* Generalizing Event-Based Motion Deblurring in Real-World Scenarios
* Generalizing Neural Human Fitting to Unseen Poses with Articulated SE(3) Equivariance
* Generating Dynamic Kernels via Transformers for Lane Detection
* Generating Instance-level Prompts for Rehearsal-free Continual Learning
* Generating Realistic Images from In-the-wild Sounds
* Generating Visual Scenes from Touch
* Generative Action Description Prompts for Skeleton-based Action Recognition
* Generative Gradient Inversion via Over-Parameterized Networks in Federated Learning
* Generative Multiplane Neural Radiance for 3D-Aware Image Generation
* Generative Novel View Synthesis with 3D-Aware Diffusion Models
* Generative Prompt Model for Weakly Supervised Object Localization
* Geometric Viewpoint Learning with Hyper-Rays and Harmonics Encoding
* Geometrized Transformer for Self-Supervised Homography Estimation
* Geometry-guided Feature Learning and Fusion for Indoor Scene Reconstruction
* GeoMIM: Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding
* GeoUDF: Surface Reconstruction from 3D Point Clouds via Geometry-guided Distance Representation
* GePSAn: Generative Procedure Step Anticipation in Cooking Videos
* Get the Best of Both Worlds: Improving Accuracy and Transferability by Grassmann Class Representation
* Get3DHuman: Lifting StyleGAN-Human into a 3D Generative Model using Pixel-aligned Reconstruction Priors
* GeT: Generative Target Structure Debiasing for Domain Adaptation
* GET: Group Event Transformer for Event-Based Vision
* GETAvatar: Generative Textured Meshes for Animatable Human Avatars
* GIFD: A Generative Gradient Inversion Method with Feature Domain Optimization
* GLA-GCN: Global-local Adaptive Graph Convolutional Network for 3D Human Pose Estimation from Monocular Video
* Global Adaptation meets Local Generalization: Unsupervised Domain Adaptation for 3D Human Pose Estimation
* Global Balanced Experts for Federated Long-Tailed Learning
* Global Features are All You Need for Image Retrieval and Reranking
* Global Knowledge Calibration for Fast Open-Vocabulary Segmentation
* Global Perception Based Autoregressive Neural Processes
* GlobalMapper: Arbitrary-Shaped Urban Layout Generation
* Gloss-free Sign Language Translation: Improving from Visual-Language Pretraining
* GlowGAN: Unsupervised Learning of HDR Images from LDR Images in the Wild
* GlueGen: Plug and Play Multi-Modal Encoders for X-to-Image Generation
* GlueStick: Robust Image Matching by Sticking Points and Lines Together
* GO-SLAM: Global Optimization for Consistent 3D Instant Reconstruction
* Going Beyond Nouns With Vision & Language Models Using Synthetic Data
* Going Denser with Open-Vocabulary Part Segmentation
* Good Student is Cooperative and Reliable: CNN-Transformer Collaborative Learning for Semantic Segmentation, A
* GPA-3D: Geometry-aware Prototype Alignment for Unsupervised Domain Adaptive 3D Object Detection from Point Clouds
* GPFL: Simultaneously Learning Global and Personalized Feature Information for Personalized Federated Learning
* GPGait: Generalized Pose-based Gait Recognition
* Gradient-based Sampling for Class Imbalanced Semi-supervised Object Detection
* Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models
* Gram-based Attentive Neural Ordinary Differential Equations Network for Video Nystagmography Classification
* GRAM-HD: 3D-Consistent Image Generation at High Resolution with Generative Radiance Manifolds
* Gramian Attention Heads are Strong yet Efficient Vision Learners
* Graph Matching with Bi-level Noisy Correspondence
* GraphAlign: Enhancing Accurate Feature Alignment by Graph matching for Multi-Modal 3D Object Detection
* GraphEcho: Graph-Driven Unsupervised Domain Adaptation for Echocardiogram Video Segmentation
* Graphics2RAW: Mapping Computer Graphics Images to Sensor RAW Images
* GridMM: Grid Memory Map for Vision-and-Language Navigation
* GridPull: Towards Scalability in Learning Implicit Representations from 3D Point Clouds
* Grounded Entity-Landmark Adaptive Pre-training for Vision-and-Language Navigation
* Grounded Image Text Matching with Mismatched Relation Reasoning
* Grounding 3D Object Affordance from 2D Interactions in Images
* Group DETR: Fast DETR Training with Group-Wise One-to-Many Assignment
* Group Pose: A Simple Baseline for End-to-End Multi-person Pose Estimation
* GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training
* Growing a Brain with Sparsity-Inducing Generation for Continual Learning
* Guided Motion Diffusion for Controllable Human Motion Synthesis
* Guiding image captioning models toward more specific captions
* Guiding Local Feature Matching with Surface Curvature
* H3WB: Human3.6M 3D WholeBody Dataset and Benchmark
* HairCLIPv2: Unifying Hair Editing via Proxy Feature Blending
* HairNeRF: Geometry-Aware Image Synthesis for Hairstyle Transfer
* HAL3D: Hierarchical Active Learning for Fine-Grained 3D Part Labeling
* HaMuCo: Hand Pose Estimation via Multiview Collaborative Self-Supervised Learning
* HandR2N2: Iterative 3D Hand Pose Estimation Using a Residual Recurrent Neural Network
* Handwritten and Printed Text Segmentation: A Signature Case Study
* Hard No-Box Adversarial Attack on Skeleton-Based Human Action Recognition with Skeleton-Motion-Informed Gradient
* Harnessing the Spatial-Temporal Attention of Diffusion Models for High-Fidelity Text-to-Image Synthesis
* Harvard Glaucoma Detection and Progression: A Multimodal Multitask Dataset and Generalization-Reinforced Semi-Supervised Learning
* Hashing Neural Video Decomposition with Multiplicative Residuals in Space-Time
* HDG-ODE: A Hierarchical Continuous-Time Model for Human Pose Forecasting
* Helping Hands: An Object-Aware Ego-Centric Video Recognition Model
* Heterogeneous Diversity Driven Active Learning for Multi-Object Tracking
* Heterogeneous Forgetting Compensation for Class-Incremental Learning
* Hidden Biases of End-to-End Driving Models
* Hiding Visual Information via Obfuscating Adversarial Perturbations
* Hierarchical Contrastive Learning for Pattern-Generalizable Image Corruption Detection
* Hierarchical Generation of Human-Object Interactions with Diffusion Probabilistic Models
* Hierarchical Point-Based Active Learning for Semi-Supervised Point Cloud Semantic Segmentation
* Hierarchical Prior Mining for Non-local Multi-View Stereo
* Hierarchical Spatio-Temporal Representation Learning for Gait Recognition
* Hierarchical Visual Categories Modeling: A Joint Representation Learning and Density Estimation Framework for Out-of-Distribution Detection
* Hierarchical Visual Primitive Experts for Compositional Zero-Shot Learning
* Hierarchically Decomposed Graph Convolutional Networks for Skeleton-Based Action Recognition
* HiFace: High-Fidelity 3D Face Reconstruction by Learning Static and Dynamic Details
* High Quality Entity Segmentation
* High-Resolution Document Shadow Removal via A Large-Scale Real-World Dataset and A Frequency-Aware Shadow Erasing Net
* HiLo: Exploiting High Low Frequency Relations for Unbiased Panoptic Scene Graph Generation
* HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training
* HiVLP: Hierarchical Interactive Video-Language Pre-Training
* HM-ViT: Hetero-modal Vehicle-to-Vehicle Cooperative Perception with Vision Transformer
* HMD-NeMo: Online 3D Avatar Motion Generation From Sparse Observations
* Holistic Geometric Feature Learning for Structured Reconstruction
* Holistic Label Correction for Noisy Multi-Label Classification
* HollowNeRF: Pruning Hashgrid-Based NeRFs with Trainable Collision Mitigation
* HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World
* Homeomorphism Alignment for Unsupervised Domain Adaptation
* Homography Guided Temporal Fusion for Road Line and Marking Segmentation
* HopFIR: Hop-wise GraphFormer with Intragroup Joint Refinement for 3D Human Pose Estimation
* HOSNeRF: Dynamic Human-Object-Scene Neural Radiance Fields from a Single Video
* Householder Projector for Unsupervised Latent Semantics Discovery
* How Far Pre-trained Models Are from Neural Collapse on the Target Dataset Informs their Transferability
* How Much Temporal Long-Term Context is Needed for Action Segmentation?
* How to Boost Face Recognition with StyleGAN?
* How to choose your best allies for a transferable attack?
* HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image Models
* HSE: Hybrid Species Embedding for Deep Metric Learning
* HSR-Diff: Hyperspectral Image Super-Resolution via Conditional Diffusion Models
* HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation
* Human from Blur: Human Pose Tracking from Blurry Images
* Human Part-wise 3D Motion Context Learning for Sign Language Recognition
* Human Preference Score: Better Aligning Text-to-image Models with Human Preference
* Human-Centric Scene Understanding for 3D Large-Scale Scenarios
* Human-Inspired Facial Sketch Synthesis with Dynamic Adaptation
* HumanMAC: Masked Motion Completion for Human Motion Prediction
* Humans in 4D: Reconstructing and Tracking Humans with Transformers
* HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation
* Hybrid Spectral Denoising Transformer with Guided Attention
* HybridAugment++: Unified Frequency Spectra Perturbations for Model Robustness
* Hyperbolic Audio-visual Zero-shot Learning
* Hyperbolic Chamfer Distance for Point Cloud Completion
* HyperDiffusion: Generating Implicit Neural Fields with Weight-Space Diffusion
* HyperReenact: One-Shot Reenactment via Jointly Learning to Refine and Retarget Faces
* I can't believe there's no images!: Learning Visual Tasks Using Only Language Supervision
* I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference
* ICD-Face: Intra-class Compactness Distillation for Face Recognition
* ICE-NeRF: Interactive Color Editing of NeRFs via Decomposition-Aware Weight Optimization
* ICICLE: Interpretable Class Incremental Continual Learning
* ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction
* iDAG: Invariant DAG Searching for Domain Generalization
* Identification of Systematic Errors of Image Classifiers on Rare Subgroups
* Identity-Consistent Aggregation for Video Object Detection
* Identity-Seeking Self-Supervised Representation Learning for Generalizable Person Re-identification
* IDiff-Face: Synthetic-based Face Recognition through Fizzy Identity-Conditioned Diffusion Models
* IHNet: Iterative Hierarchical Network Guided by High-Resolution Estimated Information for Scene Flow Estimation
* IIEU: Rethinking Neural Feature Activation from Decision-Making
* Image-free Classifier Injection for Zero-Shot Classification
* ImbSAM: A Closer Look at Sharpness-Aware Minimization in Class-Imbalanced Recognition
* ImGeoNet: Image-induced Geometry-aware Voxel Representation for Multi-view 3D Object Detection
* Imitator: Personalized Speech-driven 3D Facial Animation
* Implicit Autoencoder for Point-Cloud Self-Supervised Representation Learning
* Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head Video Generation
* Implicit Neural Representation for Cooperative Low-light Image Enhancement
* Implicit Temporal Modeling with Learnable Alignment for Video Recognition
* Improved Knowledge Transfer for Semi-supervised Domain Adaptation via Trico Training Strategy
* Improved Visual Fine-tuning with Natural Language Supervision
* Improving 3D Imaging with Pre-Trained Perpendicular 2D Diffusion Models
* Improving Adversarial Robustness of Masked Autoencoders via Test-time Frequency-domain Prompting
* Improving CLIP Fine-tuning Performance
* Improving Continuous Sign Language Recognition with Cross-Lingual Signs
* Improving Diversity in Zero-Shot GAN Adaptation with Semantic Variations
* Improving Equivariance in State-of-the-Art Supervised Depth and Normal Predictors
* Improving Generalization in Visual Reinforcement Learning via Conflict-aware Gradient Agreement Augmentation
* Improving Generalization of Adversarial Training via Robust Critical Fine-Tuning
* Improving Lens Flare Removal with General-Purpose Pipeline and Multiple Light Sources Recovery
* Improving Online Lane Graph Extraction by Object-Lane Clustering
* Improving Pixel-based MIM by Reducing Wasted Modeling Capability
* Improving Representation Learning for Histopathologic Images with Cluster Constraints
* Improving Sample Quality of Diffusion Models Using Self-Attention Guidance
* Improving Transformer-based Image Matching by Cascaded Capturing Spatially Informative Keypoints
* In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval
* Incremental Generalized Category Discovery
* Indoor Depth Recovery Based on Deep Unfolding with Non-Local Prior
* Inducing Neural Collapse to a Fixed Hierarchy-Aware Frame for Reducing Mistake Severity
* InfiniCity: Infinite-Scale City Synthesis
* Informative Data Mining for One-shot Cross-Domain Semantic Segmentation
* Inherent Redundancy in Spiking Neural Networks
* Innovating Real Fisheye Image Correction with Dual Diffusion Architecture
* Inspecting the Geographical Representativeness of Images from Text-to-Image Models
* INSTA-BNN: Binary Neural Network with INSTAnce-aware Threshold
* Instance and Category Supervision are Alternate Learners for Continual Learning
* Instance Neural Radiance Field
* Instance-aware Dynamic Prompt Tuning for Pre-trained Point Cloud Models
* Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions
* INT2: Interactive Trajectory Prediction at Intersections
* Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection
* Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual Tracking and Segmentation
* IntentQA: Context-aware Video Intent Reasoning
* Inter-Realization Channels: Unsupervised Anomaly Detection Beyond One-Class Classification
* Interaction-aware Joint Attention Estimation Using People Attributes
* Interactive Class-Agnostic Object Counting
* InterDiff: Generating 3D Human-Object Interactions with Physics-Informed Diffusion
* InterFormer Real-time Interactive Image Segmentation
* IntrinsicNeRF: Learning Intrinsic Neural Radiance Fields for Editable Novel View Synthesis
* Introducing Language Guidance in Prompt-based Continual Learning
* Invariant Feature Regularization for Fair Face Recognition
* Invariant Training 2D-3D Joint Hard Samples for Few-Shot Point Cloud Recognition
* Inverse Compositional Learning for Weakly-supervised Relation Grounding
* Inverse problem regularization with hierarchical variational autoencoders
* IOMatch: Simplifying Open-Set Semi-Supervised Learning with Joint Inliers and Outliers Utilization
* Is Imitation All You Need? Generalized Decision-Making with Dual-Phase Training
* Isomer: Isomerous Transformer for Zero-shot Video Object Segmentation
* IST-Net: Prior-free Category-level Pose Estimation with Implicit Space Transformation
* Iterative Denoiser and Noise Estimator for Self-Supervised Image Denoising
* Iterative Prompt Learning for Unsupervised Backlit Image Enhancement
* Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution
* Iterative Superquadric Recomposition of 3D Objects from Multiple Views
* ITI-Gen: Inclusive Text-to-Image Generation
* iVS-Net: Learning Human View Synthesis from Internet Videos
* Joint Demosaicing and Deghosting of Time-Varying Exposures for Single-Shot HDR Imaging
* Joint Implicit Neural Representation for High-fidelity and Compact Vector Fonts
* Joint Metrics Matter: A Better Standard for Trajectory Forecasting
* Joint-Relation Transformer for Multi-Person Motion Prediction
* JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery
* Jumping through Local Minima: Quantization in the Loss Landscape of Vision Transformers
* Kecor: Kernel Coding Rate Maximization for Active 3D Object Detection
* Keep It SimPool:Who Said Supervised Transformers Suffer from Attention Deficit?
* Kick Back & Relax: Learning to Reconstruct the World by Watching SlowTV
* Knowing Where to Focus: Event-aware Transformer for Video Grounding
* Knowledge Proxy Intervention for Deconfounded Video Question Answering
* Knowledge Restore and Transfer for Multi-Label Class-Incremental Learning
* Knowledge-Aware Federated Active Learning with Non-IID Data
* Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models
* Knowledge-Spreader: Learning Semi-Supervised Facial Action Dynamics by Consistifying Knowledge Granularity
* L-DAWA: Layer-wise Divergence Aware Weight Aggregation in Federated Self-Supervised Visual Representation Learning
* LA-Net: Landmark-Aware Learning for Reliable Facial Expression Recognition under Label Noise
* Label Shift Adapter for Test-Time Adaptation under Covariate and Label Shifts
* Label-Efficient Online Continual Object Detection in Streaming Video
* Label-Free Event-based Object Recognition via Joint Learning with Image Reconstruction from Events
* Label-Guided Knowledge Distillation for Continual Semantic Segmentation on 2D Images and 3D Point Clouds
* Label-Noise Learning with Intrinsically Long-Tailed Data
* LAC: Latent Action Composition for Skeleton-based Action Segmentation
* LAN-HDR: Luminance-based Alignment Network for High Dynamic Range Video Reconstruction
* Landscape Learning for Neural Network Inversion
* LaPE: Layer-adaptive Position Embedding for Vision Transformers with Independent Layer Normalization
* Large Selective Kernel Network for Remote Sensing Object Detection
* Large-Scale Land Cover Mapping with Fine-Grained Classes via Class-Aware Semi-Supervised Semantic Segmentation
* Large-Scale Outdoor Multi-modal Dataset and Benchmark for Novel View Synthesis and Implicit Scene Reconstruction, A
* Large-Scale Person Detection and Localization using Overhead Fisheye Cameras
* Large-scale Study of Spatiotemporal Representation Learning with a New Benchmark on Action Recognition, A
* LaRS: A Diverse Panoptic Maritime Obstacle Detection Dataset and Benchmark
* Late Stopping: Avoiding Confidently Learning from Mislabeled Examples
* Latent Space of Stochastic Diffusion Models for Zero-Shot Image Editing and Guidance, A
* Latent-OFER: Detect, Mask, and Reconstruct with Latent Vectors for Occluded Facial Expression Recognition
* LATR: 3D Lane Detection from Monocular Images with Transformer
* LAW-Diffusion: Complex Scene Generation by Diffusion with Layouts
* LayoutDiffusion: Improving Graphic Layout Generation by Discrete Diffusion Probabilistic Models
* LD-ZNet: A Latent Diffusion Approach for Text-Based Image Segmentation
* LDL: Line Distance Functions for Panoramic Localization
* LDP-Feat: Image Features with Local Differential Privacy
* LEA2: A Lightweight Ensemble Adversarial Attack via Non-overlapping Vulnerable Frequency Regions
* LeaF: Learning Frames for 4D Point Cloud Sequence Understanding
* Leaping Into Memories: Space-Time Deep Feature Synthesis
* Learn TAROT with MENTOR: A Meta-Learned Self-supervised Approach for Trajectory Prediction
* Learned Compressive Representations for Single-Photon 3D Imaging
* Learned Image Reasoning Prior Penetrates Deep Unfolding Network for Panchromatic and Multi-Spectral Image Fusion
* Learning a More Continuous Zero Level Set in Unsigned Distance Fields through Level Set Projection
* Learning A Room with the Occ-SDF Hybrid: Signed Distance Function Mingled with Occupancy Aids Scene Representation
* Learning Adaptive Neighborhoods for Graph Neural Networks
* Learning Clothing and Pose Invariant 3D Shape Representation for Long-Term Person Re-Identification
* Learning Concise and Descriptive Attributes for Visual Recognition
* Learning Concordant Attention via Target-aware Alignment for Visible-Infrared Person Re-identification
* Learning Continuous Exposure Value Representations for Single-Image HDR Reconstruction
* Learning Correction Filter via Degradation-Adaptive Regression for Blind Single Image Super-Resolution
* Learning Cross-Modal Affinity for Referring Video Object Segmentation Targeting Limited Samples
* Learning Cross-Representation Affinity Consistency for Sparsely Supervised Biomedical Instance Segmentation
* Learning Data-Driven Vector-Quantized Degradation Model for Animation Video Super-Resolution
* Learning Depth Estimation for Transparent and Mirror Surfaces
* Learning Fine-Grained Features for Pixel-wise Video Correspondences
* Learning Foresightful Dense Visual Affordance for Deformable Object Manipulation
* Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Localization
* Learning from Semantic Alignment between Unpaired Multiviews for Egocentric Video Recognition
* Learning Gabor Texture Features for Fine-Grained Recognition
* Learning Global-aware Kernel for Image Harmonization
* Learning Hierarchical Features with Joint Latent Space Energy-Based Prior
* Learning Human Dynamics in Autonomous Driving Scenarios
* Learning Human-Human Interactions in Images from Weak Textual Supervision
* Learning Image Harmonization in the Linear Color Space
* Learning Image-Adaptive Codebooks for Class-Agnostic Image Restoration
* Learning in Imperfect Environment: Multi-Label Classification with Long-Tailed Distribution and Partial Labels
* Learning Long-range Information with Dual-Scale Transformers for Indoor Scene Completion
* Learning Navigational Visual Representations with Semantic Map Supervision
* Learning Neural Eigenfunctions for Unsupervised Semantic Segmentation
* Learning Neural Implicit Surfaces with Object-Aware Radiance Fields
* Learning Non-Local Spatial-Angular Correlation for Light Field Image Super-Resolution
* Learning Optical Flow from Event Camera with Rendered Dataset
* Learning Point Cloud Completion without Complete Point Clouds: A Pose-Aware Approach
* Learning Pseudo-Relations for Cross-domain Semantic Segmentation
* Learning Rain Location Prior for Nighttime Deraining
* Learning Robust Representations with Information Bottleneck and Memory Network for RGB-D-based Gesture Recognition
* Learning Semi-supervised Gaussian Mixture Models for Generalized Category Discovery
* Learning Shape Primitives via Implicit Convexity Regularization
* Learning Spatial-Context-Aware Global Visual Feature Representation for Instance Image Retrieval
* Learning Support and Trivial Prototypes for Interpretable Image Classification
* Learning Symmetry-Aware Geometry Correspondences for 6D Object Pose Estimation
* Learning to Distill Global Representation for Sparse-View CT
* Learning to Generate Semantic Layouts for Higher Text-Image Correspondence in Text-to-Image Synthesis
* Learning to Ground Instructional Articles in Videos through Narrations
* Learning to Identify Critical States for Reinforcement Learning from Videos
* Learning to Learn: How to Continuously Teach Humans and Machines
* Learning to Transform for Generalizable Instance-wise Invariance
* Learning to Upsample by Learning to Sample
* Learning Trajectory-Word Alignments for Video-Language Tasks
* Learning Unified Decompositional and Compositional NeRF for Editable Novel View Synthesis
* Learning Versatile 3D Shape Generation with Improved Auto-regressive Models
* Learning Vision-and-Language Navigation from YouTube Videos
* Learning with Diversity: Self-Expanded Equalization for Better Generalized Deep Metric Learning
* Learning with Noisy Data for Semi-Supervised 3D Object Detection
* Lecture Presentations Multimodal Dataset: Towards Understanding Multimodality in Educational Videos
* Lens Parameter Estimation for Realistic Depth of Field Modeling
* LERF: Language Embedded Radiance Fields
* Less is More: Focus Attention for Efficient DETR
* Leveraging Inpainting for Single-Image Shadow Removal
* Leveraging Intrinsic Properties for Non-Rigid Garment Alignment
* Leveraging SE(3) Equivariance for Learning 3D Geometric Shape Assembly
* Leveraging Spatio-Temporal Dependency for Skeleton-Based Action Recognition
* LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text Sparse Retrieval
* LFS-GAN: Lifelong Few-Shot Image Generation
* LiDAR-Camera Panoptic Segmentation via Geometry-Consistent and Semantic-Aware Alignment
* LiDAR-UDA: Self-ensembling Through Time for Unsupervised LiDAR Domain Adaptation
* LightDepth: Single-View Depth Self-Supervision from Illumination Decline
* LightGlue: Local Feature Matching at Light Speed
* Lighting Every Darkness in Two Pairs: A Calibration-Free Pipeline for RAW Denoising
* Lighting up NeRF via Unsupervised Decomposition and Enhancement
* Lightweight Image Super-Resolution with Superpixel Token Interaction
* LIMITR: Leveraging Local Information for Medical Image-Text Representation
* Linear Spaces of Meanings: Compositional Structures in Vision-Language Models
* Linear-Covariance Loss for End-to-End Learning of 6D Pose Estimation
* Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge
* Lip2Vec: Efficient and Robust Visual Speech Recognition via Latent-to-Latent Visual to Audio Representation Mapping
* LIST: Learning Implicitly from Spatial Transformers for Single-View 3D Reconstruction
* LISTER: Neighbor Decoding for Length-Insensitive Scene Text Recognition
* LiveHand: Real-time and Photorealistic Neural Hand Rendering
* LivelySpeaker: Towards Semantic-Aware Co-Speech Gesture Generation
* LivePose: Online 3D Reconstruction from Monocular Video with Dynamic Camera Poses
* LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models
* LMR: A Large-Scale Multi-Reference Dataset for Reference-based Super-Resolution
* LNPL-MIL: Learning from Noisy Pseudo Labels for Promoting Multiple Instance Learning in Whole Slide Image
* Local and Global Logit Adjustments for Long-Tailed Learning
* Local Context-Aware Active Domain Adaptation
* Local or Global: Selective Knowledge Assimilation for Federated Learning with Limited Labels
* Localizing Moments in Long Video Via Multimodal Guidance
* Localizing Object-level Shape Variations with Text-to-Image Diffusion Models
* Locally Stylized Neural Radiance Fields
* Locating Noise is Halfway Denoising for Semi-Supervised Segmentation
* Locomotion-Action-Manipulation: Synthesizing Human-Scene Interactions in Complex 3D Environments
* LoCUS: Learning Multiscale 3D-consistent Features from Posed Images
* Logic-induced Diagnostic Reasoning for Semi-supervised Semantic Segmentation
* LogicSeg: Parsing Visual Semantics with Neural Logic Learning and Reasoning
* LoGoPrompt: Synthetic Text Images Can Be Good Visual Prompts for Vision-Language Models
* LoLep: Single-View View Synthesis with Locally-Learned Planes and Self-Attention Occlusion Inference
* Long-Range Grouping Transformer for Multi-View 3D Reconstruction
* Long-range Multimodal Pretraining for Movie Understanding
* Long-Term Photometric Consistent Novel View Synthesis with Diffusion Models
* Look at the Neighbor: Distortion-aware Unsupervised Domain Adaptation for Panoramic Semantic Segmentation
* Lossy and Lossless (L2) Post-training Model Size Compression
* LoTE-Animal: A Long Time-span Dataset for Endangered Animal Behavior Understanding
* Low-Light Image Enhancement with Illumination-Aware Gamma Correction and Complete Image Modelling Network
* Low-Light Image Enhancement with Multi-stage Residue Quantization and Brightness-aware Attention
* Low-Shot Object Counting Network With Iterative Prototype Adaptation, A
* LPFF: A Portrait Dataset for Face Generators Across Large Poses
* LRRU: Long-short Range Recurrent Updating Networks for Depth Completion
* LU-NeRF: Scene and Pose Estimation by Synchronizing Local Unposed NeRFs
* Luminance-aware Color Transform for Multiple Exposure Correction
* LVOS: A Benchmark for Long-term Video Object Segmentation
* M2T: Masking Transformers Twice for Faster Decoding
* MAAL: Multimodality-Aware Autoencoder-based Affordance Learning for 3D Articulated Objects
* MAGI: Multi-Annotated Explanation-Guided Learning
* MagicFusion: Boosting Text-to-Image Generation Performance by Fusing Diffusion Models
* Make Encoder Great Again in 3D GAN Inversion through Geometry and Occlusion-Aware Encoding
* Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation
* Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior
* Making and Breaking of Camouflage, The
* MAMo: Leveraging Memory and Attention for Monocular Video Depth Estimation
* Manipulate by Seeing: Creating Manipulation Controllers from Pre-Trained Representations
* MAP: Towards Balanced Generalization of IID and OOD through Model-Agnostic Adapters
* MAPConNet: Self-supervised 3D Pose Transfer with Mesh and Point Contrastive Learning
* MapFormer: Boosting Change Detection by Using Pre-change Information
* MapPrior: Bird's-Eye View Map Layout Estimation with Generative Models
* March in Chat: Interactive Prompting for Remote Embodied Referring Expression
* Markov Game Video Augmentation for Action Segmentation
* MARS: Model-agnostic Biased Object Removal without Additional Supervision for Weakly-Supervised Semantic Segmentation
* MAS: Towards Resource-Efficient Federated Multiple-Task Learning
* MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing
* Mask-Attention-Free Transformer for 3D Instance Segmentation
* Masked Autoencoders are Efficient Class Incremental Learners
* Masked Autoencoders Are Stronger Knowledge Distillers
* Masked Diffusion Transformer is a Strong Image Synthesizer
* Masked Motion Predictors are Strong 3D Action Representation Learners
* Masked Retraining Teacher-Student Framework for Domain Adaptive Object Detection
* Masked Spatio-Temporal Structure Prediction for Self-supervised Learning on Point Cloud Videos
* Masked Spiking Transformer
* MasQCLIP for Open-Vocabulary Universal Image Segmentation
* Mastering Spatial Graph Prediction of Road Networks
* MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge
* MATE: Masked Autoencoders are Online 3D Test-Time Learners
* MatrixCity: A Large-scale City Dataset for City-scale Neural Rendering and Beyond
* MatrixVT: Efficient Multi-Camera to BEV Transformation for 3D Perception
* MB-TaylorFormer: Multi-branch Efficient Transformer Expanded by Taylor Formula for Image Dehazing
* MBPTrack: Improving 3D Point Cloud Tracking with Memory networks and Box Priors
* MDCS: More Diverse Experts with Consistency Self-distillation for Long-tailed Recognition
* Measuring Asymmetric Gradient Discrepancy in Parallel Continual Learning
* MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training for X-ray Diagnosis
* MEFLUT: Unsupervised 1D Lookup Tables for Multi-exposure Image Fusion
* MEGA: Multimodal Alignment Aggregation and Distillation For Cinematic Video Segmentation
* Membrane Potential Batch Normalization for Spiking Neural Networks
* Memory-and-Anticipation Transformer for Online Action Understanding
* MemorySeg: Online LiDAR Semantic Segmentation with a Latent Memory
* MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object Tracking
* Mesh2Tex: Generating Mesh Textures from Image Queries
* Meta OOD Learning For Continuously Adaptive OOD Detection
* Meta-ZSDETR: Zero-Shot DETR with Meta-Learning
* MetaBEV: Solving Sensor Failures for 3D Detection and Map Segmentation
* MetaF2N: Blind Image Super-Resolution by Learning Efficient Model Adaptation from Faces
* MetaGCD: Learning to Continually Learn in Generalized Category Discovery
* Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image
* MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions
* MGMAE: Motion Guided Masking for Video Masked Autoencoding
* MHCN: A Hyperbolic Neural Network Model for Multi-view Hierarchical Clustering
* MHEntropy: Entropy Meets Multiple Hypotheses for Pose and Shape Recovery
* MI-GAN: A Simple Baseline for Image Inpainting on Mobile Devices
* Mimic3D: Thriving 3D-Aware GANs via 3D-to-2D Imitation
* MIMO-NeRF: Fast Neural Rendering with Multi-input Multi-output Neural Radiance Fields
* Minimal Solutions to Generalized Three-View Relative Pose Problem
* Minimal Solutions to Uncalibrated Two-view Geometry with Known Epipoles
* Minimum Latency Deep Online Video Stabilization
* Mining bias-target Alignment from Voronoi Cells
* MiniROAD: Minimal RNN Framework for Online Action Detection
* Misalign, Contrast then Distill: Rethinking Misalignments in Language-Image Pretraining
* Mitigating Adversarial Vulnerability through Causal Parameter Estimation by Adversarial Double Machine Learning
* Mitigating and Evaluating Static Bias of Action Representations in the Background and the Foreground
* MixBag: Bag-Level Data Augmentation for Learning from Label Proportions
* MixCycle: Mixup Assisted Semi-Supervised 3D Single Object Tracking with Cycle Consistency
* Mixed Neural Voxels for Fast Multi-view Video Synthesis
* MixPath: A Unified Approach for One-shot Neural Architecture Search
* MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner for Open-World Semantic Segmentation
* MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition
* MixSynthFormer: A Transformer Encoder-like Structure with Mixed Synthetic Self-attention for Efficient Human Pose Estimation
* MMST-ViT: Climate Change-aware Crop Yield Prediction via Multi-Modal Spatial-Temporal Vision Transformer
* MMVP: Motion-Matrix-based Video Prediction
* MODA: Mapping-Once Audio-driven Portrait Animation with Dual Attentions
* Modality Unifying Network for Visible-Infrared Person Re-Identification
* Model Calibration in Dense Classification with Adaptive Label Perturbation
* ModelGiF: Gradient Fields for Model Functional Distance
* Modeling the Relative Visual Tempo for Self-supervised Skeleton-based Action Recognition
* MolGrapher: Graph-based Visual Recognition of Chemical Structures
* Moment Detection in Long Tutorial Videos
* Monocular 3D Object Detection with Bounding Box Denoising in 3D by Perceiver
* MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection
* MonoNeRD: NeRF-like Representations for Monocular 3D Object Detection
* MonoNeRF: Learning a Generalizable Dynamic Radiance Field from Monocular Videos
* Monte Carlo Linear Clustering with Single-Point Supervision is Enough for Infrared Small Target Detection
* MoreauGrad: Sparse and Robust Interpretation of Neural Networks via Moreau Envelope
* MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
* Most Important Person-guided Dual-branch Cross-Patch Attention for Group Affect Recognition
* MOST: Multiple Object localization with Self-supervised Transformers for object discovery
* MoTIF: Learning Motion Trajectories with Local Implicit Neural Functions for Continuous Space-Time Video Super-Resolution
* Motion-Guided Masking for Spatiotemporal Representation Learning
* MotionBERT: A Unified Perspective on Learning Human Motion Representations
* MotionDeltaCNN: Sparse CNN Inference of Frame Differences in Moving Camera Videos with Spherical Buffers and Padded Convolutions
* MotionLM: Multi-Agent Motion Forecasting as Language Modeling
* Movement Enhancement toward Multi-Scale Video Feature Representation for Temporal Action Detection
* MPCViT: Searching for Accurate and Efficient MPC-Friendly Vision Transformer with Heterogeneous Attention
* MPI-Flow: Learning Realistic Optical Flow with Multiplane Images
* MRM: Masked Relation Modeling for Medical Image Pre-Training with Genetics
* MRN: Multiplexed Routing Network for Incremental Multilingual Text Recognition
* MSI: Maximize Support-Set Information for Few-Shot Segmentation
* MST-compression: Compressing and Accelerating Binary Neural Networks with Minimum Spanning Tree
* mu-Split: image decomposition for fluorescence microscopy
* MULLER: Multilayer Laplacian Resizer for Vision
* Multi-body Depth and Camera Pose Estimation from Multiple Views
* Multi-Directional Subspace Editing in Style-Space
* Multi-event Video-Text Retrieval
* Multi-Frequency Representation Enhancement with Privilege Information for Video Super-Resolution
* Multi-grained Temporal Prototype Learning for Few-shot Video Object Segmentation
* Multi-granularity Interaction Simulation for Unsupervised Interactive Segmentation
* Multi-interactive Feature Learning and a Full-time Multi-modality Benchmark for Image Fusion and Segmentation
* Multi-label affordance mapping from egocentric vision
* Multi-Label Knowledge Distillation
* Multi-Label Self-Supervised Learning with Scene Images
* Multi-metrics adaptively identifies backdoors in Federated learning
* Multi-Modal Continual Test-Time Adaptation for 3D Semantic Segmentation
* Multi-modal Gated Mixture of Local-to-Global Experts for Dynamic Image Fusion
* Multi-Modal Neural Radiance Field for Monocular Dense SLAM with a Light-Weight ToF Sensor
* Multi-Object Discovery by Low-Dimensional Object Motion
* Multi-Object Navigation with dynamically learned neural implicit representations
* Multi-Scale Bidirectional Recurrent Network with Hybrid Correlation for Point Cloud Based Scene Flow Estimation
* Multi-scale Residual Low-Pass Filter Network for Image Deblurring
* Multi-Task Learning with Knowledge Distillation for Dense Prediction
* Multi-task View Synthesis with Neural Radiance Fields
* Multi-View Active Fine-Grained Visual Recognition
* Multi-view Self-supervised Disentanglement for General Image Denoising
* Multi-view Spectral Polarization Propagation for Video Glass Segmentation
* Multi-weather Image Restoration via Domain Translation
* Multi3DRefer: Grounding Text Description to Multiple 3D Objects
* Multidimensional Analysis of Social Biases in Vision Transformers, A
* Multimodal Distillation for Egocentric Action Recognition
* Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing
* Multimodal High-order Relation Transformer for Scene Boundary Detection
* Multimodal Motion Conditioned Diffusion Model for Skeleton-based Video Anomaly Detection
* Multimodal Optimal Transport-based Co-Attention Transformer with Global Structure Consistency for Survival Prediction
* Multimodal Variational Auto-encoder based Audio-Visual Segmentation
* Multiple Instance Learning Framework with Masked Hard Instance Mining for Whole Slide Image Classification
* Multiple Planar Object Tracking
* Multiscale Representation for Real-Time Anti-Aliasing Neural Rendering
* Multiscale Structure Guided Diffusion for Image Deblurring
* Muscles in Action
* MUter: Machine Unlearning on Adversarially Trained Models
* MUVA: A New Large-Scale Benchmark for Multi-view Amodal Instance Segmentation in the Shopping Scenario
* MV-DeepSDF: Implicit Modeling with Multi-Sweep Point Clouds for 3D Vehicle Reconstruction in Autonomous Driving
* MV-Map: Offboard HD Map Generation with Multi-view Consistency
* MVPSNet: Fast Generalizable Multi-view Photometric Stereo
* Name Your Colour For the Task: Artificially Discover Colour Naming via Colour Quantisation Transformer
* NAPA-VQ: Neighborhood Aware Prototype Augmentation with Vector Quantization for Continual Learning
* Narrator: Towards Natural Control of Human-Scene Interaction Generation via Relationship Reasoning
* Navigating to Objects Specified by Images
* NaviNeRF: NeRF-based 3D Representation Disentanglement by Latent Semantic Navigation
* NCHO: Unsupervised Learning for Neural 3D Composition of Humans and Objects
* NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates Space
* NDDepth: Normal-Distance Assisted Monocular Depth Estimation
* Nearest Neighbor Guidance for Out-of-Distribution Detection
* Neglected Free Lunch: Learning Image Classifiers Using Annotation Byproducts
* NeILF++: Inter-Reflectable Light Fields for Geometry and Material Estimation
* NeMF: Inverse Volume Rendering with Neural Microflake Field
* NEMTO: Neural Environment Matting for Novel View and Relighting Synthesis of Transparent Objects
* NeO 360: Neural Fields for Sparse View Synthesis of Outdoor Scenes
* NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection
* NeRF-LOAM: Neural Implicit Representation for Large-Scale Incremental LiDAR Odometry and Mapping
* NeRF-MS: Neural Radiance Fields with Multi-Sequence
* NerfAcc: Efficient Sampling Accelerates NeRFs
* Nerfbusters: Removing Ghostly Artifacts from Casually Captured NeRFs
* NeRFrac: Neural Radiance Fields through Refractive Surface
* NeSS-ST: Detecting Good and Stable Keypoints with a Neural Stability Score and the Shi-Tomasi detector
* NeTO: Neural Reconstruction of Transparent Objects with Self-Occlusion Aware Refraction-Tracing
* Neural Collage Transfer: Artistic Reconstruction via Material Manipulation
* Neural Deformable Models for 3D Bi-Ventricular Heart Shape Reconstruction and Modeling from 2D Sparse Cardiac Magnetic Resonance Imaging
* Neural Fields for Structured Lighting
* Neural Haircut: Prior-Guided Strand-Based Hair Reconstruction
* Neural Implicit Surface Evolution
* Neural Interactive Keypoint Detection
* Neural LiDAR Fields for Novel View Synthesis
* Neural Microfacet Fields for Inverse Rendering
* Neural Radiance Fields with LiDAR Maps
* Neural Reconstruction of Relightable Human Model from Monocular Video
* Neural Video Depth Stabilizer
* Neural-PBIR Reconstruction of Shape, Material, and Illumination
* NeuRBF: A Neural Fields Representation with Adaptive Radial Basis Functions
* NeuS2: Fast Learning of Neural Implicit Surfaces for Multi-view Reconstruction
* NIR-assisted Video Enhancement via Unpaired 24-hour Data
* NLOS-NeuS: Non-line-of-sight Neural Implicit Surface
* No Fear of Classifier Biases: Neural Collapse Inspired Federated Learning with Synthetic and Fixed Classifier
* Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning
* Noise2Info: Noisy Image to Information of Noise for Self-Supervised Image Denoising
* Non-Coaxial Event-guided Motion Deblurring with Spatial Alignment
* Non-Semantics Suppressed Mask Learning for Unsupervised Video Semantic Compression
* Nonrigid Object Contact Estimation With Regional Unwrapping Transformer
* Normalizing Flows for Human Pose Anomaly Detection
* Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement
* Not All Steps are Created Equal: Selective Diffusion Distillation for Image Manipulation
* Not Every Side Is Equal: Localization Uncertainty Estimation for Semi-Supervised 3D Object Detection
* Novel Scenes & Classes: Towards Adaptive Open-set Object Detection
* Novel-view Synthesis and Pose Estimation for Hand-Object Interaction from Sparse Views
* NPC: Neural Point Characters from Video
* NSF: Neural Surface Fields for Human Modeling from Monocular Depth
* Object as Query: Lifting any 2D Object Detector to 3D Detection
* Object-aware Gaze Target Detection
* Object-Centric Multiple Object Tracking
* ObjectFusion: Multi-modal 3D Object Detection with Object-Centric Fusion
* Objects do not disappear: Video object detection by single-frame object location anticipation
* ObjectSDF++: Improved Object-Compositional Neural Implicit Surfaces
* Occ2Net: Robust Image Matching Based on 3D Occupancy Estimation for Occluded Regions
* OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction
* OCHID-Fi: Occlusion-Robust Hand Pose Estimation in 3D via RF-Vision
* OFVL-MS: Once for Visual Localization across Multiple Indoor Scenes
* Omnidirectional Information Gathering for Knowledge Transfer-based Audio-Visual Navigation
* OmniLabel: A Challenging Benchmark for Language-Based Object Detection
* OmnimatteRF: Robust Omnimatte with 3D Background Modeling
* OmniZoomer: Learning to Move and Zoom in on Sphere at High-Resolution
* On the Audio-visual Synchronization for Lip-to-Speech Synthesis
* On the Effectiveness of Spectral Discriminators for Perceptual Quality Improvement
* On the Robustness of Normalizing Flows for Inverse Problems in Imaging
* On the Robustness of Open-World Test-Time Training: Self-Training with Dynamic Prototype Expansion
* Once Detected, Never Lost: Surpassing Human Performance in Offline LiDAR based 3D Object Detection
* One-bit Flip is All You Need: When Bit-flip Attack Meets Model Training
* One-Shot Generative Domain Adaptation
* One-shot Implicit Animatable Avatars with Model-based Priors
* One-shot recognition of any material anywhere using contrastive learning with physics-based rendering
* Online Class Incremental Learning on Stochastic Blurry Task Boundary via Mask and Visual Prompt Tuning
* Online Clustered Codebook
* Online Continual Learning on Hierarchical Label Expansion
* Online Prototype Learning for Online Continual Learning
* OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation
* Open Set Video HOI detection from Action-centric Chain-of-Look Prompting
* Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities
* Open-Vocabulary Object Detection With an Open Corpus
* Open-vocabulary Object Segmentation with Diffusion Models
* Open-vocabulary Panoptic Segmentation with Embedding Modulation
* Open-Vocabulary Semantic Segmentation with Decoupled One-Pass Network
* Open-Vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models
* OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception
* OPERA: Omni-Supervised Representation Learning with Hierarchical Supervisions
* Optimizing the Placement of Roadside LiDARs for Autonomous Driving
* ORC: Network Group-based Knowledge Distillation using Online Role Change
* Ord2Seq: Regarding Ordinal Regression as Label Sequence Prediction
* Order-preserving Consistency Regularization for Domain Adaptation and Generalization
* Order-Prompted Tag Sequence Generation for Video Tagging
* Ordered Atomic Activity for Fine-grained Interactive Traffic Scenario Understanding
* Ordinal Label Distribution Learning
* OrthoPlanes: A Novel Representation for Better 3D-Awareness of GANs
* Out-of-Distribution Detection for Monocular Depth Estimation
* Out-of-domain GAN inversion via Invertibility Decomposition for Photo-Realistic Human Face Manipulation
* Overcoming Forgetting Catastrophe in Quantization-Aware Training
* Overwriting Pretrained Bias with Finetuning Data
* OxfordTVG-HIC: Can Machine Make Humorous Captions from Images?
* P1AC: Revisiting Absolute Pose From a Single Affine Correspondence
* P2C: Self-Supervised Point Cloud Completion from Single Partial Clouds
* PADCLIP: Pseudo-labeling with Adaptive Debiasing in CLIP for Unsupervised Domain Adaptation
* PADDLES: Phase-Amplitude Spectrum Disentangled Early Stopping for Learning with Noisy Labels
* Pairwise Similarity Learning is SimPLE
* PanFlowNet: A Flow-Based Deep Network for Pan-sharpening
* Panoramas from Photons
* Parallax-Tolerant Unsupervised Deep Image Stitching
* Parallel Attention Interaction Network for Few-Shot Skeleton-Based Action Recognition
* Parameterized Cost Volume for Stereo Matching
* Parametric Classification for Generalized Category Discovery: A Baseline Study
* Parametric Depth Based Feature Representation Learning for Object Detection and Segmentation in Bird's-Eye View
* Parametric Information Maximization for Generalized Category Discovery
* ParCNetV2: Oversized Kernel with Enhanced Attention*
* PARF: Primitive-Aware Radiance Fusion for Indoor Scene Novel View Synthesis
* PARIS: Part-level Reconstruction and Motion Analysis for Articulated Objects
* Parse-Then-Place Approach for Generating Graphic Layouts from Textual Descriptions, A
* Part-Aware Transformer for Generalizable Person Re-identification
* Partition Speeds Up Learning Implicit Neural Representations Based on Exponential-Increase Hypothesis
* Partition-and-Debias: Agnostic Biases Mitigation via A Mixture of Biases-Specific Experts
* PARTNER: Level up the Polar Representation for LiDAR 3D Object Detection
* Passive Ultra-Wideband Single-Photon Imaging
* Pasta: Proportional Amplitude Spectrum Training Augmentation for Syn-to-Real Domain Generalization
* PatchCT: Aligning Patch Set and Label Set with Conditional Transport for Multi-Label Image Classification
* PATMAT: Person Aware Tuning of Mask-Aware Transformer for Face inpainting
* PC-Adapter: Topology-Aware Adapter for Efficient Domain Adaption on Point Clouds with Rectified Pseudo-label
* PDiscoNet: Semantically consistent part discovery for fine-grained recognition
* PEANUT: Predicting and Navigating to Unseen Targets
* Perceptual Artifacts Localization for Image Synthesis Tasks
* Perceptual Grouping in Contrastive Vision-Language Models
* Perils of Learning From Unlabeled Data: Backdoor Attacks on Semi-supervised Learning, The
* Periodically Exchange Teacher-Student for Source-Free Object Detection
* Perpetual Humanoid Control for Real-time Simulated Avatars
* Persistent-Transient Duality: A Multi-mechanism Approach for Modeling Human-Object Interaction
* Person Re-Identification without Identification via Event Anonymization
* Personalized Image Generation for Color Vision Deficiency Population
* Personalized Semantics Excitation for Federated Image Classification
* PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images
* PG-RCNN: Semantic Surface Point Generation for 3D Object Detection
* PGFed: Personalize Each Client's Global Objective for Federated Learning
* PhaseMP: Robust 3D Pose Estimation via Phase-conditioned Human Motion Prior
* Phasic Content Fusing Diffusion Model with Directional Distribution Consistency for Few-Shot Model Adaption
* PHRIT: Parametric Hand Representation with Implicit Template
* PhysDiff: Physics-Guided Human Motion Diffusion Model
* Physically-plausible illumination distribution estimation
* Physics-Augmented Autoencoder for 3D Skeleton-Based Gait Recognition
* Physics-Driven Turbulence Image Restoration with Stochastic Refinement
* PIDRo: Parallel Isomeric Attention with Dynamic Routing for Text-Video Retrieval
* PIRNet: Privacy-Preserving Image Restoration Network via Wavelet Lifting
* PivotNet: Vectorized Pivot Learning for End-to-end HD Map Construction
* Pix2Video: Video Editing using Image Diffusion
* Pixel Adaptive Deep Unfolding Transformer for Hyperspectral Image Reconstruction
* Pixel-Aligned Recurrent Queries for Multi-View 3D Object Detection
* Pixel-Wise Contrastive Distillation
* PlanarTrack: A Large-scale Challenging Benchmark for Planar Object Tracking
* PlaneRecTR: Unified Query Learning for 3D Plane Recovery from a Single View
* PlankAssembly: Robust 3D Reconstruction from Three Orthographic Views with Learnt Shape Programs
* Plausible Uncertainties for Human Pose Regression
* Pluralistic Aging Diffusion Autoencoder
* PNI: Industrial Anomaly Detection using Position and Neighborhood Information
* PODIA-3D: Domain Adaptation of 3D Generative Model Across Large Domain Gap Using Pose-Preserved Text-to-Image Diffusion
* Poincaré ResNet
* Point Contrastive Prediction with Semantic Clustering for Self-Supervised Learning on Point Cloud Videos
* Point-Query Quadtree for Crowd Counting, Localization, and More
* Point-SLAM: Dense Neural Point Cloud-based SLAM
* Point-TTA: Test-Time Adaptation for Point Cloud Registration Using Multitask Meta-Auxiliary Learning
* Point2Mask: Point-supervised Panoptic Segmentation via Optimal Transport
* PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning
* PointDC: Unsupervised Semantic Segmentation of 3D Point Clouds via Cross-modal Distillation and Super-Voxel Clustering
* PointMBF: A Multi-scale Bidirectional Fusion Network for Unsupervised RGB-D Point Cloud Registration
* PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking
* PolicyCleanse: Backdoor Detection and Mitigation for Competitive Reinforcement Learning
* Ponder: Point Cloud Pre-training via Neural Rendering
* Pose-Free Neural Radiance Fields via Implicit Pose Regularization
* PoseDiffusion: Solving Pose Estimation via Diffusion-aided Bundle Adjustment
* PoseFix: Correcting 3D Human Poses with Natural Language
* PourIt!: Weakly-supervised Liquid Perception from a Single Image for Visual Closed-Loop Robotic Pouring
* Power of Sound (TPoS): Audio Reactive Video Generation with Stable Diffusion, The
* PPR: Physically Plausible Reconstruction from Monocular Videos
* Practical Membership Inference Attacks Against Large-Scale Multi-Modal Models: A Pilot Study
* PRANC: Pseudo RAndom Networks for Compacting deep models
* Pre-training Vision Transformers with Very Limited Synthesized Images
* Pre-training-free Image Manipulation Localization through Non-Mutually Exclusive Contrastive Learning
* Predict to Detect: Prediction-guided 3D Object Detection using Sequential Images
* Preface: A Data-driven Volumetric Prior for Few-shot Ultra High-resolution Face Synthesis
* Preparing the Future for Continual Semantic Segmentation
* Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models
* Preserving Modality Structure Improves Multi-Modal Learning
* Preserving Tumor Volumes for Unsupervised Medical Image Registration
* PreSTU: Pre-Training for Scene-Text Understanding
* Pretrained Language Models as Visual Planners for Human Assistance
* Preventing Zero-Shot Transfer Degradation in Continual Learning of Vision-Language Models
* Prior-guided Source-free Domain Adaptation for Human Pose Estimation
* PRIOR: Prototype Representation Joint Learning from Medical Images and Reports
* Priority-Centric Human Motion Generation in Discrete Latent Space
* Privacy Preserving Localization via Coordinate Permutations
* Privacy-Preserving Face Recognition Using Random Frequency Components
* Probabilistic Human Mesh Recovery in 3D Scenes from Egocentric Views
* Probabilistic Modeling of Inter- and Intra-observer Variability in Medical Image Segmentation
* Probabilistic Precision and Recall Towards Reliable Evaluation of Generative Models
* Probabilistic Triangulation for Uncalibrated Multi-View 3D Human Pose Estimation
* ProbVLM: Probabilistic Adapter for Frozen Vison-Language Models
* Progressive Spatio-Temporal Prototype Matching for Text-Video Retrieval
* Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval
* Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion Models
* Prompt-aligned Gradient for Prompt Tuning
* PromptCap: Prompt-Guided Image Captioning for VQA with GPT-3
* PromptStyler: Prompt-driven Style Generation for Source-free Domain Generalization
* ProPainter: Improving Propagation and Transformer for Video Inpainting
* ProtoFL: Unsupervised Federated Learning via Prototypical Distillation
* ProtoTransfer: Cross-Modal Prototype Transfer for Point Cloud Segmentation
* Prototype Reminiscence and Augmented Asymmetric Knowledge Aggregation for Non-Exemplar Class-Incremental Learning
* Prototype-based Dataset Comparison
* Prototypical Kernel Learning and Open-set Foreground Perception for Generalized Few-shot Semantic Segmentation
* Proxy Anchor-based Unsupervised Learning for Continuous Generalized Category Discovery
* Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation
* Pseudo Flow Consistency for Self-Supervised 6D Object Pose Estimation
* Pseudo-Label Alignment for Semi-Supervised Instance Segmentation
* PVT++: A Simple End-to-End Latency-Aware Visual Tracking Framework
* Pyramid Dual Domain Injection Network for Pan-sharpening
* PØDA: Prompt-driven Zero-shot Domain Adaptation
* Q-Diffusion: Quantizing Diffusion Models
* QD-BEV: Quantization-aware View-guided Distillation for Multi-view 3D Object Detection
* Quality Diversity for Visual Pre-Training
* Quality-Agnostic Deepfake Detection with Intra-model Collaborative Learning
* Query Refinement Transformer for 3D Instance Segmentation
* Query6DoF: Learning Sparse Queries as Implicit Shape Prior for Category-Level 6DoF Pose Estimation
* R-Pred: Two-Stage Motion Prediction Via Tube-Query Attention-Based Trajectory Refinement
* R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple Cameras
* RANA: Relightable Articulated Neural Avatars
* Random Boxes Are Open-world Object Detectors
* Random Sub-Samples Generation for Self-Supervised Real Image Denoising
* Randomized Quantization: A Generic Augmentation for Data Agnostic Self-supervised Learning
* RankMatch: Fostering Confidence and Consistency in Learning with Noisy Labels
* RankMixup: Ranking-Based Mixup Training for Network Calibration
* Rapid Adaptation in Online Continual Learning: Are We Evaluating It Right?
* Rapid Network Adaptation: Learning to Adapt Neural Networks Using Test-Time Feedback
* RawHDR: High Dynamic Range Image Reconstruction from a Single Raw Image
* Ray Conditioning: Trading Photo-consistency for Photo-realism in Multi-view Image Generation
* RbA: Segmenting Unknown Regions Rejected by All
* RCA-NOC: Relative Contrastive Alignment for Novel Object Captioning
* Re-mine, Learn and Reason: Exploring the Cross-modal Semantic Correlations for Language-guided HOI detection
* Re-ReND: Real-time Rendering of NeRFs across Devices
* Re:PolyWorld - A Graph Neural Network for Polygonal Scene Parsing
* ReactioNet: Learning High-order Facial Behavior from Universal Stimulus-Reaction by Dyadic Relation Reasoning
* Read-only Prompt Optimization for Vision-Language Few-shot Learning
* Real-Time Neural Rasterization for Large Scenes
* RealGraph: A Multiview Dataset for 4D Real-world Context Graph Generation
* Realistic Full-Body Tracking from Sparse Observations via Joint-Level Modeling
* REAP: A Large-Scale Realistic Adversarial Patch Benchmark
* Reconciling Object-Level and Global-Level Objectives for Long-Tail Detection
* Reconstructed Convolution Module Based Look-Up Tables for Efficient Image Super-Resolution
* Reconstructing Groups of People with Hypergraph Relational Reasoning
* Reconstructing Interacting Hands with Interaction Prior from Monocular Images
* Recovering a Molecule's 3D Dynamics from Liquid-phase Electron Microscopy Movies
* RecRecNet: Rectangling Rectified Wide-Angle Images by Thin-Plate Spline Model and DoF-based Curriculum Learning
* Recursive Video Lane Detection
* RecursiveDet: End-to-End Region-based Recursive Object Detection
* RED-PSM: Regularization by Denoising of Partially Separable Models for Dynamic Imaging
* Reducing Training Time in Cross-Silo Federated Learning using Multigraph Topology
* Ref-NeuS: Ambiguity-Reduced Neural Implicit Surface Learning for Multi-View Reconstruction with Reflection
* RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4D
* Reference-guided Controllable Inpainting of Neural Radiance Fields
* Referring Image Segmentation Using Text Supervision
* ReFit: Recurrent Fitting Network for 3D Human Recovery
* ReGen: A good Generative zero-shot video classifier should be Rewarded
* RegFormer: An Efficient Projection-Aware Transformer Network for Large-Scale Point Cloud Registration
* Regularized Mask Tuning: Uncovering Hidden Knowledge in Pre-trained Vision-Language Models
* Regularized Primitive Graph Learning for Unified Vector Mapping
* Rehearsal-Free Domain Continual Face Anti-Spoofing: Generalize More and Forget Less
* Reinforce Data, Multiply Impact: Improved Model Accuracy and Robustness with Dataset Reinforcement
* Reinforced Disentanglement for Face Swapping without Skip Connection
* ReLeaPS: Reinforcement Learning-based Illumination Planning for Generalized Photometric Stereo
* Relightify: Relightable 3D Faces from a Single Image via Diffusion Models
* Remembering Normality: Memory-guided Knowledge Distillation for Unsupervised Anomaly Detection
* ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model
* Removing Anomalies as Noises for Industrial Defect Localization
* RenderIH: A Large-scale Synthetic Dataset for 3D Interacting Hand Pose Estimation
* Rendering Humans from Object-Occluded Monocular Videos
* ReNeRF: Relightable Neural Radiance Fields with Nearfield Lighting
* Replay: Multi-modal Multi-view Acted Videos for Casual Holography
* RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers
* Representation Disparity-aware Distillation for 3D Object Detection
* Residual Pattern Learning for Pixel-wise Out-of-Distribution Detection in Semantic Segmentation
* ResQ: Residual Quantization for Video Perception
* ReST: A Reconfigurable Spatial-Temporal Graph Model for Multi-Camera Multi-Object Tracking
* Rethinking Amodal Video Segmentation from Learning Supervised Signals with Object-centric Representation
* Rethinking Data Distillation: Do Not Overlook Calibration
* Rethinking Fast Fourier Convolution in Image Inpainting
* Rethinking Mobile Block for Efficient Attention-based Models
* Rethinking Multi-Contrast MRI Super-Resolution: Rectangle-Window Cross-Attention Transformer and Arbitrary-Scale Upsampling
* Rethinking Point Cloud Registration as Masking and Reconstruction
* Rethinking pose estimation in crowds: overcoming the detection information bottleneck and ambiguity
* Rethinking Range View Representation for LiDAR Segmentation
* Rethinking Safe Semi-supervised Learning: Transferring the Open-set Problem to A Close-set One
* Rethinking the Role of Pre-Trained Networks in Source-Free Domain Adaptation
* Rethinking Video Frame Interpolation from Shutter Mode Induced Degradation
* Rethinking Vision Transformers for MobileNet Size and Speed
* Retinexformer: One-stage Retinex-based Transformer for Low-light Image Enhancement
* Retro-FPN: Retrospective Feature Pyramid Network for Point Cloud Semantic Segmentation
* Retrospect to Multi-prompt Learning across Vision and Language, A
* Revisit PCA-based technique for Out-of-Distribution Detection
* Revisiting Domain-Adaptive 3D Object Detection by Reliable, Diverse and Class-balanced Pseudo-Labeling
* Revisiting Foreground and Background Separation in Weakly-supervised Temporal Action Localization: A Clustering-based Approach
* Revisiting Scene Text Recognition: A Data Perspective
* Revisiting the Parameter Efficiency of Adapters from the Perspective of Precision Redundancy
* RFD-ECNet: Extreme Underwater Image Compression with Reference to Feature Dictionary
* RFLA: A Stealthy Reflected Light Adversarial Attack in the Physical World
* Rickrolling the Artist: Injecting Backdoors into Text Encoders for Text-to-Image Synthesis
* RICO: Regularizing the Unobservable for Indoor Compositional Reconstruction
* RIGID: Recurrent GAN Inversion and Editing of Real Face Videos
* RLIPv2: Fast Scaling of Relational Language-Image Pre-training
* RLSAC: Reinforcement Learning enhanced Sample Consensus for End-to-End Robust Estimation
* RMP-Loss: Regularizing Membrane Potential Distribution for Spiking Neural Networks
* Robo3D: Towards Robust and Reliable 3D Perception against Corruptions
* Robust e-NeRF: NeRF from Sparse & Noisy Events under Non-Uniform Motion
* Robust Evaluation of Diffusion-Based Adversarial Purification
* Robust Frame-to-Frame Camera Rotation Estimation in Crowded Scenes
* Robust Geometry-Preserving Depth Estimation Using Differentiable Rendering
* Robust Heterogeneous Federated Learning under Data Corruption
* Robust Mixture-of-Expert Training for Convolutional Neural Networks
* Robust Monocular Depth Estimation under Challenging Conditions
* Robust Object Modeling for Visual Tracking
* Robust One-Shot Face Video Re-enactment using Hybrid Latent Spaces of StyleGAN2
* Robust Referring Video Object Segmentation with Cyclic Structural Consensus
* Robustifying Token Attention for Vision Transformers
* Role-aware Interaction Generation from Textual Description
* ROME: Robustifying Memory-Efficient NAS via Topology Disentanglement and Gradient Accumulation
* Root Pose Decomposition Towards Generic Non-rigid 3D Reconstruction with Monocular Videos
* Rosetta Neurons: Mining the Common Units in a Model Zoo
* RPEFlow: Multimodal Fusion of RGB-PointCloud-Event for Joint Optical Flow and Scene Flow Estimation
* RPG-Palm: Realistic Pseudo-data Generation for Palmprint Recognition
* RSFNet: A White-Box Image Retouching Approach using Region-Specific Color Filters
* S-TREK: Sequential Translation and Rotation Equivariant Keypoints for local feature extraction
* S-VolSDF: Sparse Multi-View Stereo Regularization of Neural Implicit Surfaces
* S3IM: Stochastic Structural SIMilarity and Its Unreasonable Effectiveness for Neural Fields
* SA-BEV: Generating Semantic-Aware Bird's-Eye-View Feature for Multi-view 3D Object Detection
* SAFARI: Versatile and Efficient Evaluations for Robustness of Interpretability
* SAFE: Machine Unlearning With Shard Graphs
* SAFE: Sensitivity-Aware Features for Out-of-Distribution Object Detection
* SAFL-Net: Semantic-Agnostic Feature Learning Network with Auxiliary Plugins for Image Manipulation Detection
* SAGA: Spectral Adversarial Geometric Attack on 3D Meshes
* SAL-ViT: Towards Latency Efficient Private Inference on ViT using Selective Attention Search with a Learnable Softmax Approximation
* SALAD: Part-Level Latent Diffusion for 3D Shape Generation and Manipulation
* Saliency Regularization for Self-Training with Partial Annotations
* Sample-adaptive Augmentation for Point Cloud Recognition Against Real-world Corruptions
* Sample-wise Label Confidence Incorporation for Learning with Noisy Labels
* Sample4Geo: Hard Negative Sampling For Cross-View Geo-Localisation
* SAMPLING: Scene-adaptive Hierarchical Multiplane Images Representation for Novel View Synthesis from a Single Image
* Sat2Density: Faithful Density Learning from Satellite-Ground Image Pairs
* SatlasPretrain: A Large-Scale Dataset for Remote Sensing Image Understanding
* SATR: Zero-Shot Semantic Segmentation of 3D Shapes
* SC3K: Self-supervised and Coherent 3D Keypoints Estimation from Rotated, Noisy, and Decimated Point Cloud Data
* Scalable Diffusion Models with Transformers
* Scalable Multi-Temporal Remote Sensing Change Data Generation via Simulating Stochastic Change Process
* Scalable Video Object Segmentation with Simplified Framework
* Scale-Aware Modulation Meet Transformer
* Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning
* Scaling Data Generation in Vision-and-Language Navigation
* SCANet: Scene Complexity Aware Network for Weakly-Supervised Video Moment Retrieval
* ScanNet++: A High-Fidelity Dataset of 3D Indoor Scenes
* Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding in Long Videos
* ScatterNeRF: Seeing Through Fog with Physically-Based Inverse Neural Rendering
* Scene as Occupancy
* Scene Graph Contrastive Learning for Embodied Navigation
* Scene Matters: Model-based Deep Video Compression
* Scene-Aware Feature Matching
* Scene-Aware Label Graph Learning for Multi-Label Image Classification
* SceneRF: Self-Supervised Monocular 3D Scene Reconstruction with Radiance Fields
* Scenimefy: Learning to Craft Anime Scene via Semi-Supervised Image-to-Image Translation
* SCOB: Universal Text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain Gap
* Score Priors Guided Deep Variational Inference for Unsupervised Real-World Single Image Denoising
* Score-Based Diffusion Models as Principled Priors for Inverse Imaging
* Scratch Each Other's Back: Incomplete Multi-modal Brain Tumor Segmentation Via Category Aware Group Self-Support Learning
* Scratching Visual Transformer's Back with Uniform Attention
* Seal-3D: Interactive Pixel-Level Editing for Neural Radiance Fields
* Search for or Navigate to? Dual Adaptive Thinking for Object Navigation
* See More and Know More: Zero-shot Point Cloud Segmentation via Multi-modal Visual Data
* SeeABLE: Soft Discrepancies and Bounded Contrastive Learning for Exposing Deepfakes
* Seeing Beyond the Patch: Scale-Adaptive Semantic Segmentation of High-resolution Remote Sensing Imagery based on Reinforcement Learning
* SEFD: Learning to Distill Complex Pose and Occlusion
* SegGPT: Towards Segmenting Everything In Context
* Segment Anything
* Segment Every Reference Object in Spatial and Temporal Spaces
* Segmentation of Tubular Structures Using Iterative Training with Tailored Samples
* Segmenting Known Objects and Unseen Unknowns without Prior Knowledge
* SegPrompt: Boosting Open-world Segmentation via Category-level Prompt Learning
* SegRCDB: Semantic Segmentation via Formula-Driven Supervised Learning
* SeiT: Storage-Efficient Vision Training with Tokens Using 1% of Pixel Storage
* Self-Calibrated Cross Attention Network for Few-Shot Segmentation
* Self-Evolved Dynamic Expansion Model for Task-Free Continual Learning
* Self-Feedback DETR for Temporal Action Detection
* Self-Ordering Point Clouds
* Self-Organizing Pathway Expansion for Non-Exemplar Class-Incremental Learning
* Self-regulating Prompts: Foundational Model Adaptation without Forgetting
* Self-similarity Driven Scale-invariant Learning for Weakly Supervised Person Search
* Self-Supervised Burst Super-Resolution
* Self-supervised Character-to-Character Distillation for Text Recognition
* Self-supervised Cross-view Representation Reconstruction for Change Captioning
* Self-supervised Image Denoising with Downsampled Invariance Loss and Conditional Blind-Spot Network
* Self-supervised Learning of Implicit Shape Representation with Dense Correspondence for Deformable Objects
* Self-supervised Learning to Bring Dual Reversed Rolling Shutter Images Alive
* Self-Supervised Monocular Depth Estimation by Direction-aware Cumulative Convolution Network
* Self-supervised Monocular Depth Estimation: Let's Talk About The Weather
* Self-supervised Monocular Underwater Depth Recovery, Image Restoration, and a Real-sea Video Dataset
* Self-Supervised Object Detection from Egocentric Videos
* Self-supervised Pre-training for Mirror Detection
* Semantic Attention Flow Fields for Monocular Dynamic Scene Decomposition
* Semantic Information in Contrastive Learning
* Semantic-Aware Dynamic Parameter for Video Inpainting Transformer
* Semantic-Aware Implicit Template Learning via Part Deformation Consistency
* Semantically Structured Image Compression via Irregular Group-Based Decoupling
* Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos
* Semantics-Consistent Feature Search for Self-Supervised Visual Representation Learning
* Semantify: Simplifying the Control of 3D Morphable Models using CLIP
* SemARFlow: Injecting Semantics into Unsupervised Optical Flow Estimation for Autonomous Driving
* Semi-Supervised Learning via Weight-aware Distillation under Class Distribution Mismatch
* Semi-Supervised Semantic Segmentation under Label Noise via Diverse Learning Groups
* Semi-supervised Semantics-guided Adversarial Training for Robust Trajectory Prediction
* Semi-supervised Speech-driven 3D Facial Animation via Cross-modal Encoding
* Sempart: Self-supervised Multi-resolution Partitioning of Image Semantics
* Sentence Attention Blocks for Answer Grounding
* Sentence Speaks a Thousand Images: Domain Generalization through Distilling CLIP with Language Guidance, A
* Sequential Texts Driven Cohesive Motions Synthesis with Natural Transitions
* Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models
* SFHarmony: Source Free Domain Adaptation for Distributed Neuroimaging Analysis
* SG-Former: Self-guided Transformer with Evolving Token Reallocation
* SGAligner: 3D Scene Alignment with Scene Graphs
* SHACIRA: Scalable HAsh-grid Compression for Implicit Neural Representations
* Shape Analysis of Euclidean Curves under Frenet-Serret Framework
* Shape Anchor Guided Holistic Indoor Scene Understanding
* ShapeScaffolder: Structure-Aware 3D Shape Generation from Text
* Shatter and Gather: Learning Referring Image Segmentation with Text Supervision
* SHERF: Generalizable Human NeRF from a Single Image
* Shift from Texture-bias to Shape-Bias: Edge Deformation-Based Augmentation for Robust Object Recognition
* ShiftNAS: Improving One-shot NAS via Probability Shift
* Shortcut-V2V: Compression Framework for Video-to-Video Translation based on Temporal Redundancy Reduction
* Shrinking Class Space for Enhanced Certainty in Semi-Supervised Learning
* SIDGAN: High-Resolution Dubbed Video Generation via Shift-Invariant Learning
* sigma-Adaptive Decoupled Prototype for Few-Shot Object Detection
* SIGMA: Scale-Invariant Global Sparse Shape Matching
* Sigmoid Loss for Language Image Pre-Training
* Sign Language Translation with Iterative Prototype
* SiLK: Simple Learned Keypoints
* SILT: Shadow-aware Iterative Label Tuning for Learning to Detect Shadows from Noisy Labels
* SimFIR: A Simple Framework for Fisheye Image Rectification with Self-supervised Representation Learning
* Similarity Min-Max: Zero-Shot Day-Night Domain Adaptation
* SimMatchV2: Semi-Supervised Learning with Graph Consistency
* SimNP: Learning Self-Similarity Priors Between Neural Points
* Simoun: Synergizing Interactive Motion-appearance Understanding for Vision-based Reinforcement Learning
* Simple and Effective Out-of-Distribution Detection via Cosine-based Softmax Loss
* Simple Baselines for Interactive Video Retrieval with Questions and Answers
* Simple Framework for Open-Vocabulary Segmentation and Detection, A
* Simple Recipe to Meta-Learn Forward and Backward Transfer, A
* Simple Vision Transformer for Weakly Semi-supervised 3D Object Detection, A
* SimpleClick: Interactive Image Segmentation with Simple Vision Transformers
* Simulating Fluids in Real-World Still Images
* SINC: Self-Supervised In-Context Learning for Vision-Language Tasks
* SINC: Spatial Composition of 3D Human Motions for Simultaneous Action Generation
* Single Depth-image 3D Reflection Symmetry and Shape Prediction
* Single Image Deblurring with Row-dependent Blur Magnitude
* Single Image Defocus Deblurring via Implicit Neural Inverse Kernels
* Single Image Reflection Separation via Component Synergy
* Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction
* SIRA-PCR: Sim-to-Real Adaptation for 3D Point Cloud Registration
* Size Does Matter: Size-aware Virtual Try-on via Clothing-oriented Transformation Try-on Network
* SKED: Sketch-guided Text-based 3D Editing
* skeletonization algorithm for gradient-based optimization, A
* SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence Pre-training
* SkeleTR: Towards Skeleton-based Action Recognition in the Wild
* Sketch and Text Guided Diffusion Model for Colored Point Cloud Generation
* Skill Transformer: A Monolithic Policy for Mobile Manipulation
* Skip-Plan: Procedure Planning in Instructional Videos via Condensed Action Space Learning
* SKiT: a Fast Key Information Video Transformer for Online Surgical Phase Recognition
* SlaBins: Fisheye Depth Estimation using Slanted Bins on Road Environments
* SLAN: Self-Locator Aided Network for Vision-Language Understanding
* SLCA: Slow Learner with Classifier Alignment for Continual Learning on a Pre-trained Model
* Small Object Detection via Coarse-to-fine Proposal Generation and Imitation Learning
* SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training
* SMMix: Self-Motivated Image Mixing for Vision Transformers
* Smoothness Similarity Regularization for Few-Shot GAN Adaptation
* Snow Removal in Video: A New Dataset and A Novel Method
* SOAR: Scene-debiasing Open-set Action Recognition
* Social Diffusion: Long-term Multiple Human Motion Anticipation
* SOCS: Semantically-aware Object Coordinate Space for Category-Level 6D Object Pose Estimation under Large Shape Variations
* SoDaCam: Software-defined Cameras via Single-Photon Imaging
* soft nearest-neighbor framework for continual semi-supervised learning, A
* Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation
* Sound Source Localization is All about Cross-Modal Alignment
* Source-free Depth for Object Pop-out
* Source-free Domain Adaptive Human Pose Estimation
* Space Engage: Collaborative Space Supervision for Contrastive-based Semi-Supervised Semantic Segmentation
* Space-time Prompting for Video Class-incremental Learning
* SPACE: Speech-driven Portrait Animation with Controllable Expression
* SpaceEvo: Hardware-Friendly Search Space Design for Efficient INT8 Inference
* Spacetime Surface Regularization for Neural Dynamic Scene Reconstruction
* Sparse Instance Conditioned Multimodal Trajectory Prediction
* Sparse Point Guided 3D Lane Detection
* Sparse Sampling Transformer with Uncertainty-Driven Ranking for Unified Removal of Raindrops and Rain Streaks
* SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera Videos
* SparseDet: Improving Sparsely Annotated Object Detection with Pseudo-positive Mining
* SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection
* SparseMAE: Sparse Training Meets Masked Autoencoders
* SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis
* Spatial Self-Distillation for Object Detection with Inaccurate Bounding Boxes
* Spatial-Aware Token for Weakly Supervised Object Localization
* Spatially and Spectrally Consistent Deep Functional Maps
* Spatially-Adaptive Feature Modulation for Efficient Image Super-Resolution
* Spatio-Temporal Crop Aggregation for Video Representation Learning
* Spatio-Temporal Domain Awareness for Multi-Agent Collaborative Perception
* Spatio-temporal Prompting Network for Robust Video Feature Extraction
* Spectral Graphormer: Spectral Graph-based Transformer for Egocentric Two-Hand Reconstruction using Multi-View Color Images
* Spectrum-guided Multi-granularity Referring Video Object Segmentation
* Speech2Lip: High-fidelity Speech to Lip Generation by Learning from a Short Video
* Speech4Mesh: Speech-Assisted Monocular 3D Facial Reconstruction for Speech-Driven 3D Facial Animation
* Spherical Space Feature Decomposition for Guided Depth Map Super-Resolution
* SpinCam: High-Speed Imaging via a Rotating Point-Spread Function
* SportsMOT: A Large Multi-Object Tracking Dataset in Multiple Sports Scenes
* Spurious Features Everywhere: Large-Scale Detection of Harmful Spurious Features in ImageNet
* SQAD: Automatic Smartphone Camera Quality Assessment and Benchmarking
* SRFormer: Permuted Self-Attention for Single Image Super-Resolution
* SSB: Simple but Strong Baseline for Boosting Performance of Open-Set Semi-Supervised Learning
* SSDA: Secure Source-Free Domain Adaptation
* SSF: Accelerating Training of Spiking Neural Networks with Stabilized Spiking Flow
* Stabilizing Visual Reinforcement Learning via Asymmetric Interactive Cooperation
* Stable Cluster Discrimination for Deep Clustering
* Stable Signature: Rooting Watermarks in Latent Diffusion Models, The
* StableVideo: Text-driven Consistency-aware Diffusion Video Editing
* StageInteractor: Query-based Object Detector with Cross-stage Interaction
* Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional Image Synthesis
* STEERER: Resolving Scale Variations for Counting and Localization via Selective Inheritance Learning
* StegaNeRF: Embedding Invisible Information within Neural Radiance Fields
* step towards understanding why classification helps regression, A
* STEPs: Self-Supervised Key Step Extraction and Localization from Unlabeled Procedural Videos
* Stochastic Segmentation with Conditional Categorical Diffusion Models
* Story Visualization by Online Text Augmentation with Context Memory
* STPrivacy: Spatio-Temporal Privacy-Preserving Action Recognition
* Strata-NeRF : Neural Radiance Fields for Stratified Scenes
* Strip-MLP: Efficient Token Interaction for Vision MLP
* Strivec: Sparse Tri-Vector Radiance Fields
* Structural Alignment for Network Pruning through Partial Regularization
* Structure and Content-Guided Video Synthesis with Diffusion Models
* Structure Invariant Transformation for better Adversarial Transferability
* Structure-Aware Surface Reconstruction via Primitive Assembly
* Studying How to Efficiently and Effectively Guide Models with Explanations
* StyleDiffusion: Controllable Disentangled Style Transfer via Diffusion Models
* StyleDomain: Efficient and Lightweight Parameterizations of StyleGAN for One-shot and Few-shot Domain Adaptation
* StyleGANEX: StyleGAN-Based Manipulation Beyond Cropped Aligned Faces
* StyleInV: A Temporal Style Modulated Inversion Network for Unconditional Video Generation
* StyleLipSync: Style-based Personalized Lip-sync Video Generation
* StylerDALLE: Language-Guided Style Transfer Using a Vector-Quantized Tokenizer of a Large-Scale Generative Model
* Subclass-balancing Contrastive Learning for Long-tailed Recognition
* SUMMIT: Source-Free Adaptation of Uni-Modal Models to Multi-Modal Targets
* Supervised Homography Learning with Realistic Dataset Generation
* SupFusion: Supervised LiDAR-Camera Fusion for 3D Object Detection
* Surface Extraction from Neural Unsigned Distance Fields
* Surface Normal Clustering for Implicit Representation of Manhattan Scenes
* SurfsUp: Learning Fluid Simulation for Novel Surfaces
* SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving
* SuS-X: Training-Free Name-Only Transfer of Vision-Language Models
* SVDFormer: Complementing Point Cloud via Self-view Augmentation and Self-structure Dual-generator
* SVDiff: Compact Parameter Space for Diffusion Fine-Tuning
* SVQNet: Sparse Voxel-Adjacent Query Network for 4D Spatio-Temporal LiDAR Semantic Segmentation
* SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications
* SwinLSTM: Improving Spatiotemporal Prediction Accuracy using Swin Transformer and LSTM
* SYENet: A Simple Yet Effective Network for Multiple Low-Level Vision Tasks with Real-time Performance on Mobile Device
* SynBody: Synthetic Dataset with Layered Human Models for 3D Human Perception and Modeling
* Synchronize Feature Extracting and Matching: A Single Branch Framework for 3D Object Tracking
* Synthesizing Diverse Human Motions in 3D Indoor Scenes
* Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models
* Talking Head Generation with Probabilistic Audio-to-Visual Diffusion Priors
* TALL: Thumbnail Layout for Deepfake Video Detection
* Taming Contrast Maximization for Learning Sequential, Low-latency, Event-based Optical Flow
* Tangent Model Composition for Ensembling and Continual Fine-tuning
* Tangent Sampson Error: Fast Approximate Two-view Reprojection Error for Central Camera Models
* TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement
* TARGET: Federated Class-Continual Learning via Exemplar-Free Distillation
* Task Agnostic Restoration of Natural Video Dynamics
* Task-Aware Adaptive Learning for Cross-domain Few-Shot Learning
* Task-Oriented Multi-Modal Mutual Learning for Vision-Language Models
* TaskExpert: Dynamically Assembling Multi-Task Representations with Memorial Mixture-of-Experts
* Taxonomy Adaptive Cross-Domain Adaptation in Medical Imaging via Optimization Trajectory Distillation
* TCOVIS: Temporally Consistent Online Video Instance Segmentation
* Teaching CLIP to Count to Ten
* TeD-SPAD: Temporal Distinctiveness for Self-supervised Privacy-preservation for video Anomaly Detection
* Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer
* Template Inversion Attack against Face Recognition Systems using 3D Face Reconstruction
* Template-guided Hierarchical Feature Restoration for Anomaly Detection
* TEMPO: Efficient Multi-View Pose Estimation, Tracking, and Forecasting
* Temporal Collection and Distribution for Referring Video Object Segmentation
* Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction
* Temporal-Coded Spiking Neural Networks with Dynamic Firing Threshold: Learning with Event-Driven Backpropagation
* Test Time Adaptation for Blind Image Quality Assessment
* Test-time Personalizable Forecasting of 3D Human Poses
* Tetra-NeRF: Representing Neural Radiance Fields Using Tetrahedra
* TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion Models
* Text-Conditioned Sampling Framework for Text-to-Image Generation with Masked Generative Models
* Text-Driven Generative Domain Adaptation with Spectral Consistency Regularization
* Text2Performer: Text-Driven Human Video Generation
* Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models
* Text2Tex: Text-driven Texture Synthesis via Diffusion Models
* Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators
* TextManiA: Enriching Visual Feature by Text-driven Manifold Augmentation
* TextPSG: Panoptic Scene Graph Generation from Textual Descriptions
* Texture Generation on 3D Meshes with Point-UV Diffusion
* Texture Learning Domain Randomization for Domain Generalized Segmentation
* TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition
* Theoretical and Numerical Analysis of 3D Reconstruction Using Point and Line Incidences
* Theory of Topological Derivatives for Inverse Rendering of Geometry, A
* Thinking Image Color Aesthetics Assessment: Models, Datasets and Benchmarks
* TiDAL: Learning Training Dynamics for Active Learning
* TiDy-PSFs: Computational Imaging with Time-Averaged Dynamic Point-Spread-Functions
* TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering
* TIJO: Trigger Inversion with Joint Optimization for Defending Multimodal Backdoored Models
* Tiled Multiplane Images for Practical 3D Photography
* Time-to-Contact Map by Joint Estimation of Up-to-Scale Inverse Depth and Global Motion using a Single Event Camera
* TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance
* TM2D: Bimodality Driven 3D Dance Generation via Music-Text Integration
* TMA: Temporal Motion Aggregation for Event-based Optical Flow
* TMR: Text-to-Motion Retrieval Using Contrastive 3D Human Motion Synthesis
* To Adapt or Not to Adapt? Real-Time Adaptation for Semantic Segmentation
* Token-Label Alignment for Vision Transformers
* Too Large; Data Reduction for Vision-Language Pre-Training
* ToonTalker: Cross-Domain Face Reenactment
* TopoSeg: Topology-Aware Nuclear Instance Segmentation
* TORE: Token Reduction for Efficient Human Mesh Recovery with Transformer
* Total-Recon: Deformable Scene Reconstruction for Embodied View Synthesis
* Toward Multi-Granularity Decision-Making: Explicit Visual Reasoning with Hierarchical Knowledge
* Toward Unsupervised Realistic Visual Question Answering
* Towards Attack-tolerant Federated Learning via Critical Parameter Analysis
* Towards Authentic Face Restoration with Iterative Diffusion Models and Beyond
* Towards Better Robustness against Common Corruptions for Unsupervised Domain Adaptation
* Towards Building More Robust Models with Frequency Bias
* Towards Content-based Pixel Retrieval in Revisited Oxford and Paris
* Towards Deeply Unified Depth-aware Panoptic Segmentation with Bi-directional Guidance Learning
* Towards Effective Instance Discrimination Contrastive Loss for Unsupervised Domain Adaptation
* Towards Fair and Comprehensive Comparisons for Image-Based 3D Object Detection
* Towards Fairness-aware Adversarial Network Pruning
* Towards General Low-Light Raw Noise Synthesis and Modeling
* Towards Generic Image Manipulation Detection with Weakly-Supervised Self-Consistency Learning
* Towards Geospatial Foundation Models via Continual Pretraining
* Towards Grand Unified Representation Learning for Unsupervised Visible-Infrared Person Re-Identification
* Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images
* Towards High-Quality Specular Highlight Removal by Leveraging Large-Scale Synthetic Data
* Towards Improved Input Masking for Convolutional Neural Networks
* Towards Inadequately Pre-trained Models in Transfer Learning
* Towards Instance-adaptive Inference for Federated Learning
* Towards Memory- and Time-Efficient Backpropagation for Training Spiking Neural Networks
* Towards Models that Can See and Read
* Towards Multi-Layered 3D Garments Animation
* Towards Nonlinear-Motion-Aware and Occlusion-Robust Rolling Shutter Correction
* Towards Open-Set Test-Time Adaptation Utilizing the Wisdom of Crowds in Entropy Minimization
* Towards Open-Vocabulary Video Instance Segmentation
* Towards Real-World Burst Image Super-Resolution: Benchmark and Method
* Towards Realistic Evaluation of Industrial Continual Learning Scenarios with an Emphasis on Energy Consumption and Computational Footprint
* Towards Robust and Smooth 3D Multi-Person Pose Estimation from Monocular Videos in the Wild
* Towards Robust Model Watermark via Reducing Parametric Vulnerability
* Towards Saner Deep Image Registration
* Towards Semi-supervised Learning with Non-random Missing Labels
* Towards Understanding the Generalization of Deepfake Detectors from a Game-Theoretical View
* Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts
* Towards Universal Image Embeddings: A Large-Scale Dataset and Challenge for Generic Image Representations
* Towards Universal LiDAR-Based 3D Object Detection by Multi-Domain Knowledge Transfer
* Towards Unsupervised Domain Generalization for Face Anti-Spoofing
* Towards Viewpoint Robustness in Bird's Eye View Segmentation
* Towards Viewpoint-Invariant Visual Recognition via Adversarial Training
* Towards Zero Domain Gap: A Comprehensive Study of Realistic LiDAR Simulation for Autonomy Testing
* Towards Zero-Shot Scale-Aware Monocular Depth Estimation
* Tracing the Origin of Adversarial Attack for Forensic Investigation and Deterrence
* TrackFlow: Multi-Object Tracking with Normalizing Flows
* Tracking Anything with Decoupled Video Segmentation
* Tracking by 3D Model Estimation of Unknown Objects in Videos
* Tracking by Natural Language Specification with Long Short-term Context Decoupling
* Tracking Everything Everywhere All at Once
* Tracking without Label: Unsupervised Multiple Object Tracking via Contrastive Similarity Learning
* Traj-MAE: Masked Autoencoders for Trajectory Prediction
* Trajectory Unified Transformer for Pedestrian Trajectory Prediction
* TrajectoryFormer: 3D Object Tracking Transformer with Predictive Trajectory Hypotheses
* TrajPAC: Towards Robustness Verification of Pedestrian Trajectory Prediction Models
* TransFace: Calibrating Transformer Training for Face Recognition from a Data-Centric Perspective
* Transferable Adversarial Attack for Both Vision Transformers and Convolutional Networks via Momentum Integrated Gradients
* Transferable Decoding with Visual Entities for Zero-Shot Image Captioning
* TransHuman: A Transformer-based Human Representation for Generalizable Neural Human Rendering
* TransIFF: An Instance-Level Feature Fusion Framework for Vehicle-Infrastructure Cooperative 3D Detection with Transformers
* Translating Images to Road Network: A Non-Autoregressive Sequence-to-Sequence Approach
* Transparent Shape from a Single View Polarization Image
* TransTIC: Transferring Transformer-based Image Compression from Human Perception to Machine Perception
* Treating Pseudo-labels Generation as Image Matting for Weakly Supervised Semantic Segmentation
* Tree-Structured Shading Decomposition
* Tri-MipRF: Tri-Mip Representation for Efficient Anti-Aliasing Neural Radiance Fields
* TripLe: Revisiting Pretrained Model Reuse and Progressive Learning for Efficient Vision Transformer Scaling and Searching
* TRM-UAP: Enhancing the Transferability of Data-Free Universal Adversarial Perturbation via Truncated Ratio Maximization
* Troubleshooting Ethnic Quality Bias with Curriculum Domain Adaptation for Face Image Quality Assessment
* Tube-Link: A Flexible Cross Tube Framework for Universal Video Segmentation
* Tubelet-Contrastive Self-Supervision for Video-Efficient Generalization
* Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
* Tuning Pre-trained Model via Moment Probing
* Two Birds, One Stone: A Unified Framework for Joint Learning of Image and Video Style Transfers
* Two-in-One Depth: Bridging the Gap Between Monocular and Binocular Self-supervised Depth Estimation
* U-RED: Unsupervised 3D Shape Retrieval and Deformation for Partial Point Clouds
* UATVR: Uncertainty-Adaptive Text-Video Retrieval
* UCF: Uncovering Common Features for Generalizable Deepfake Detection
* UGC: Unified GAN Compression for Efficient Image-to-Image Translation
* UHDNeRF: Ultra-High-Definition Neural Radiance Fields
* UMC: A Unified Bandwidth-efficient and Multi-resolution based Collaborative Perception Framework
* UMFuse: Unified Multi View Fusion for Human Editing applications
* UMIFormer: Mining the Correlations between Similar Tokens for Multi-View 3D Reconstruction
* Unaligned 2D to 3D Translation with Conditional Vector-Quantized Code Diffusion using Transformers
* Uncertainty Guided Adaptive Warping for Robust and Efficient Stereo Matching
* Uncertainty-aware State Space Transformer for Egocentric 3D Hand Trajectory Forecasting
* Uncertainty-aware Unsupervised Multi-Object Tracking
* Uncertainty-guided Learning for Improving Image Manipulation Detection
* Under-Display Camera Image Restoration with Scattering Effect
* Understanding 3D Object Interaction from a Single Image
* Understanding Hessian Alignment for Domain Generalization
* Understanding Self-attention Mechanism via Dynamical System Perspective
* Understanding the Feature Norm for Out-of-Distribution Detection
* Unfolding Framework with Prior of Convolution-Transformer Mixture and Uncertainty Estimation for Video Snapshot Compressive Imaging
* Uni-3D: A Universal Model for Panoptic 3D Scene Reconstruction
* UniDexGrasp++: Improving Dexterous Grasping Policy Learning via Geometry-aware Curriculum and Iterative Generalist-Specialist Learning
* UniFace: Unified Cross-Entropy Loss for Deep Face Recognition
* Unified Adversarial Patch for Cross-modal Attacks in the Physical World
* Unified Coarse-to-Fine Alignment for Video-Text Retrieval
* Unified Continual Learning Framework with General Parameter-Efficient Tuning, A
* Unified Data-Free Compression: Pruning and Quantization without Fine-Tuning
* Unified Framework for Robustness on Diverse Sampling Errors, A
* Unified Out-Of-Distribution Detection: A Model-Specific Perspective
* Unified Pre-training with Pseudo Texts for Text-To-Image Person Re-identification
* Unified Visual Relationship Detection with Vision and Language Models
* UniFormerV2: Unlocking the Potential of Image ViTs for Video Understanding
* UniFusion: Unified Multi-view Fusion Transformer for Spatial-Temporal Representation in Bird's-Eye-View
* Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation
* UniKD: Universal Knowledge Distillation for Mimicking Homogeneous or Heterogeneous Object Detectors
* Unilaterally Aggregated Contrastive Learning with Hierarchical Augmentation for Anomaly Detection
* UniSeg: A Unified Multi-Modal LiDAR Segmentation Network and the OpenPCSeg Codebase
* UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding
* UnitedHuman: Harnessing Multi-Source Data for High-Resolution Human Generation
* UniTR: A Unified and Efficient Multi-Modal Transformer for Bird's-Eye-View Representation
* Universal Domain Adaptation via Compressive Attention Matching
* UniverSeg: Universal Medical Image Segmentation
* UniVTG: Towards Unified Video-Language Temporal Grounding
* Unleashing Text-to-Image Diffusion Models for Visual Perception
* Unleashing the Potential of Spiking Neural Networks with Dynamic Confidence
* Unleashing the Power of Gradient Signal-to-Noise Ratio for Zero-Shot NAS
* Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection
* UnLoc: A Unified Framework for Video Localization Tasks
* Unmasked Teacher: Towards Training-Efficient Video Foundation Models
* Unmasking Anomalies in Road-Scene Segmentation
* Unpaired Multi-domain Attribute Translation of 3D Facial Shapes with a Square and Symmetric Geometric Map
* Unreasonable Effectiveness of Large Language-Vision Models for Source-free Video Domain Adaptation, The
* Unsupervised 3D Perception with 2D Vision-Language Distillation for Autonomous Driving
* Unsupervised Accuracy Estimation of Deep Visual Models using Domain-Adaptive Adversarial Perturbation without Source Samples
* Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models
* Unsupervised Domain Adaptation for Training Event-Based Networks Using Contrastive Learning and Uncorrelated Conditioning
* Unsupervised Domain Adaptive Detection with Network Stability Analysis
* Unsupervised Facial Performance Editing via Vector-Quantized StyleGAN Representations
* Unsupervised Feature Representation Learning for Domain-generalized Cross-domain Image Retrieval
* Unsupervised Image Denoising in Real-World Scenarios via Self-Collaboration Parallel Generative Adversarial Branches
* Unsupervised Learning of Object-Centric Embeddings for Cell Instance Segmentation in Microscopy Images
* Unsupervised Manifold Linearizing and Clustering
* Unsupervised Object Localization with Representer Point Selection
* Unsupervised Open-Vocabulary Object Localization in Videos
* Unsupervised Prompt Tuning for Text-Driven Object Detection
* Unsupervised Self-Driving Attention Prediction via Uncertainty Mining and Knowledge Embedding
* Unsupervised Surface Anomaly Detection with Diffusion Probabilistic Model
* Unsupervised Video Deraining with An Event Camera
* Unsupervised Video Object Segmentation with Online Adversarial Self-Tuning
* UpCycling: Semi-supervised 3D Object Detection without Sharing Raw-level Unlabeled Scenes
* Urban Radiance Field Representation with Deformable Neural Mesh Primitives
* UrbanGIRAFFE: Representing Urban Scenes as Compositional Generative Neural Feature Fields
* USAGE: A Unified Seed Area Generation Paradigm for Weakly Supervised Semantic Segmentation
* Using a Waffle Iron for Automotive Point Cloud Semantic Segmentation
* V-FUSE: Volumetric Depth Map Fusion with Long-Range Constraints
* V3Det: Vast Vocabulary Visual Detection Dataset
* VAD: Vectorized Scene Representation for Efficient Autonomous Driving
* VADER: Video Alignment Differencing and Retrieval
* Vanishing Point Estimation in Uncalibrated Images with Prior Gravity Direction
* VAPCNet: Viewpoint-Aware 3D Point Cloud Completion
* Variational Causal Inference Network for Explanatory Visual Question Answering
* Variational Degeneration to Structural Refinement: A Unified Framework for Superimposed Image Decomposition
* Verbs in Action: Improving verb understanding in video-language models
* VeRi3D: Generative Vertex-based Radiance Fields for 3D Controllable Human Image Synthesis
* Versatile Diffusion: Text, Images and Variations All in One Diffusion Model
* VertexSerum: Poisoning Graph Neural Networks for Link Inference
* VI-Net: Boosting Category-level 6D Object Pose Estimation via Learning Decoupled Rotations on the Spherical Representations
* Victim and The Beneficiary: Exploiting a Poisoned Model to Train a Clean Model on Poisoned Data, The
* Video Action Recognition with Attentive Semantic Units
* Video Action Segmentation via Contextually Refined Temporal Keypoints
* Video Adverse-Weather-Component Suppression Network via Weather Messenger and Adversarial Backpropagation
* Video Anomaly Detection via Sequentially Learning Multiple Pretext Tasks
* Video Background Music Generation: Dataset, Method and Evaluation
* Video Object Segmentation-aware Video Frame Interpolation
* Video OWL-ViT: Temporally-consistent open-world localization in video
* Video State-Changing Object Segmentation
* Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
* Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
* VideoFlow: Exploiting Temporal Cues for Multi-frame Optical Flow Estimation
* VidStyleODE: Disentangled Video Editing via StyleGAN and NeuralODEs
* View Consistent Purification for Accurate Cross-View Localization
* Viewing Graph Solvability in Practice
* ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding
* Viewset Diffusion: (0-)Image-Conditioned 3D Generative Models from 2D Data
* ViLLA: Fine-Grained Vision-Language Representation Learning from Real-World Data
* ViLTA: Enhancing Vision-Language Pre-training through Textual Augmentation
* ViM: Vision Middleware for Unified Downstream Transferring
* ViperGPT: Visual Inference via Python Execution for Reasoning
* Virtual Try-On with Pose-Garment Keypoints Guided Inpainting
* Visible-Infrared Person Re-Identification via Semantic Alignment and Affinity Inference
* Vision Grid Transformer for Document Layout Analysis
* Vision HGNN: An Image is More than a Graph of Nodes
* Vision Relation Transformer for Unbiased Scene Graph Generation
* Vision Transformer Adapters for Generalizable Multitask Learning
* Visual Explanations via Iterated Integrated Attributions
* Visual Traffic Knowledge Graph Generation from Scene Images
* Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World
* VL-Match: Enhancing Vision-Language Pretraining with Token-Level and Instance-Level Matching
* VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control
* VLN-PETL: Parameter-Efficient Transfer Learning for Vision-and-Language Navigation
* VLSlice: Interactive Vision-and-Language Slice Discovery
* VoroMesh: Learning Watertight Surface Meshes with Voronoi Diagrams
* Vox-E: Text-guided Voxel Editing of 3D Objects
* VQ3D: Learning a 3D-Aware Generative Model on ImageNet
* VQA Therapy: Exploring Answer Differences by Visually Grounding Answers
* VQA-GNN: Reasoning with Multimodal Knowledge via Graph Neural Networks for Visual Question Answering
* Waffling around for Performance: Visual Classification with Random Words and Broad Concepts
* WALDO: Future Video Synthesis using Object Layer Decomposition and Parametric Flow Prediction
* Walking Your LiDOG: A Journey Through Multiple Domains for LiDAR Semantic Segmentation
* Wasserstein Expansible Variational Autoencoder for Discriminative and Generative Continual Learning
* WaterMask: Instance Segmentation for Underwater Imagery
* WaveIPT: Joint Attention and Flow Alignment in the Wavelet domain for Pose Transfer
* WaveNeRF: Wavelet-based Generalizable Neural Radiance Fields
* WDiscOOD: Out-of-Distribution Detection via Whitened Linear Discriminant Analysis
* Weakly Supervised Learning of Semantic Correspondence through Cascaded Online Correspondence Refinement
* Weakly Supervised Referring Image Segmentation with Intra-Chunk and Inter-Chunk Consistency
* Weakly-supervised 3D Pose Transfer with Keypoints
* Weakly-Supervised Action Localization by Hierarchically-structured Latent Attention Modeling
* Weakly-Supervised Action Segmentation and Unseen Error Detection in Anomalous Instructional Videos
* Weakly-Supervised Text-driven Contrastive Learning for Facial Behavior Understanding
* What can a cook in Italy teach a mechanic in India? Action Recognition Generalisation Over Scenarios and Locations
* What can Discriminator do? Towards Box-free Ownership Verification of Generative Adversarial Networks
* What Can Simple Arithmetic Operations Do for Temporal Modeling?
* What do neural networks learn in image classification? A frequency shortcut perspective
* What does a platypus look like? Generating customized prompts for zero-shot image classification
* What does CLIP know about a red circle? Visual prompt engineering for VLMs
* When Do Curricula Work in Federated Learning?
* When Epipolar Constraint Meets Non-local Operators in Multi-View Stereo
* When Noisy Labels Meet Long Tail Dilemmas: A Representation Calibration Method
* When Prompt-based Incremental Learning Does Not Meet Strong Pretraining
* When to Learn What: Model-Adaptive Data Augmentation Curriculum
* Who are you referring to? Coreference resolution in image narrations
* Why do networks have inhibitory/negative connections?
* Why Is Prompt Tuning for Vision-Language Models Robust to Noisy Labels?
* Will Large-scale Generative Models Corrupt Future Datasets?
* Window-Based Early-Exit Cascades for Uncertainty Estimation: When Deep Ensembles are More Efficient than Single Models
* With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning
* Workie-Talkie: Accelerating Federated Learning by Overlapping Computing and Communications via Contrastive Regularization
* X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance
* X-VoE: Measuring eXplanatory Violation of Expectation in Physical Events
* XiNet: Efficient Neural Networks for tinyML
* XMem++: Production-level Video Segmentation From Few Annotated Frames
* XNet: Wavelet-Based Low and High Frequency Fusion Networks for Fully- and Semi-Supervised Semantic Segmentation of Biomedical Images
* XVO: Generalized Visual Odometry via Cross-Modal Self-Training
* Yes, we CANN: Constrained Approximate Nearest Neighbors for local feature-based visual localization
* You Never Get a Second Chance To Make a Good First Impression: Seeding Active Learning for 3D Semantic Segmentation
* Your Diffusion Model is Secretly a Zero-Shot Classifier
* Zenseact Open Dataset: A large-scale and diverse multimodal dataset for autonomous driving
* Zero-1-to-3: Zero-shot One Image to 3D Object
* Zero-guidance Segmentation Using Zero Segment Labels
* Zero-Shot Composed Image Retrieval with Textual Inversion
* Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer
* Zero-Shot Point Cloud Segmentation by Semantic-Visual Aware Synthesis
* Zero-shot spatial layout conditioning for text-to-image diffusion models
* Zip-NeRF: Anti-Aliased Grid-Based Neural Radiance Fields
* Zolly: Zoom Focal Length Correctly for Perspective-Distorted Human Mesh Reconstruction
* zPROBE: Zero Peek Robustness Checks for Federated Learning
* 3D Curve Based Matching Method Using Dynamic Programming
* 3D Symmetry-Curvature Duality Theorems
* Active Vision
* Against Quantitative Optical Flow
* Ambiguities of a Motion Field
* Analysis of a Road Image as Seen from a Vehicle
* Asp: A Continuous Viewer-Centered Representation for 3D Object Recognition, The
* Autonomous Land Vehicle Road Following
* Building, Registering, and Fusing Noisy Visual Maps
* Closed-Form Solutions to Image Flow Equations for 3D Structure and Motion
* Complexity Level Analysis of Immediate Vision, A
* Composite Edge Detection with Random Field Models
* Computationally-Efficient Methods for Recovering Translational Motion
* Computing the Euler Number of a 3D Image
* Contour Based Stereo Algorithm, A
* Control Free, Low-Level Image Segmentation: Theory, Architecture, and Experimentation
* Coordinate Rotation Invariance of Image Characteristics for 3D Shape and Motion Recovery
* Cylindrical Shape from Contour and Shading without Knowledge of Lighting Conditions or Surface Albedo
* Depth Reconstruction in Stereopsis
* Detecting Moving Objects
* Detecting Textons and Texture Boundaries in Natural Images
* Deterministic Bayesian Estimation of Markovian Random Fields with Applications to Computational Vision
* Development, Implementation, Testing, and Application of an Affine Transform Invariant Curvature Function
* Discrete Scale-Space Representation, A
* Early Detection of Motion Boundaries, The
* Error Analysis of Motion Parameter Estimation from Image Sequences
* Estimation of Image Motion Using Wavefront Region Growing
* Estimation of Motion and Structure of 3-D Objects from a Sequence of Images
* Evidence-Based 3D Vision System for Range Images, An
* Extracting Surfaces from Stereo Images: an Integrated Approach
* Fast and Reliable Passive Trinocular Stereovision
* From Waltz to Winston (via the Connection Table)
* Hierarchical Likelihood Approach for Region Segmentation According to Motion-Based Criteria, A
* Highlight Identification Using Chromatic Information
* Highspeed Stereo Matching System Based on Dynamic Programming, A
* Image Algebra in a Nutshell
* Image Sequence Analysis of Complex Physical Objects: Nonlinear Small Scale Water Surface Waves
* Image Sequence Segmentation Using Motion Coherence
* Inferring Surfaces from Boundaries
* Interaction of Different Modules in Depth Perception
* Invariant Properties of the Projections of Straight Homogeneous Generalized Cylinders
* Layout2: A Production System Modeling Visual Perspective Information
* Learning To Recognize Objects Using Feature Indexed Hypotheses
* Linking Image-Space and Accumulator-Space: A New Approach for Object Recognition
* Local Shape from Specularity
* Low Level Image Analysis on an MIMD Architecture
* Matching from 3-D Range Models into 2-D Intensity Scenes
* Matching Image Edges to Object Memory
* Method for Enforcing Integrability in Shape from Shading Algorithms, A
* Method for Initial Hypothesis Formation in Image Understanding, A
* Model for the Extraction of Image Flow, A
* Model of Visual Knowledge Representation, A
* Motion and Structure from Motion from Point and Line Matches
* Multi-Resolution Morphology
* Multisensor Integration: Experiments in Integrating Thermal and Visual Sensors
* New Approach to Surface Reconstruction: The Coupled Depth/Slope Model, A
* Object Recognition Using Alignment
* On Projective Geometry and the Recovery of 3-D Structure
* On the Geometric Interpretation of Image Contours
* One-Dimensional Regularization with Discontinuities
* Optical Flow Using Spatiotemporal Filters
* Optimal Edge Detection Using Recursive Filtering
* Optimal orientation detection of linear symmetry
* Parallel Algorithms for Computer Vision on the Connection Machine
* Perception and Computation
* Position-, Rotation-, and Scale-Invariant Pattern Recognition Using Parallel Distributed Processing
* Problem of Robust Shape Descriptions, The
* Propagation of Interpretations Based on Graded Resolution Input
* Quantization Errors in Stereo Triangulation
* Range-Imaging System Utilizing Nematic Liquid Crystal Mask
* Recognition by Parts
* Recognition of Object Families Using Parameterized Models
* Reconstruction of Two-Dimensional Velocity Fields as a Linear Estimation Problem
* Reconstructions of Surfaces from Profiles
* Representation and Three-Dimensional Interpretation of Image Texture: An Integrated Approach
* Representation of Variables in Connectionist Networks, The
* Scale Change Versus Scale Space Representation
* Shape Decomposition by Mathematical Morphology
* Shape from Darkness
* Shape Recognition from Single Silhouettes
* Singularities of Contrast Functions in Scale Space
* Snakes: Active Contour Models
* Solution and Uniqueness of Image Flow Equations for rigid Curved Surfaces in Motion
* Solving Ill-Conditioned Problems by Minimizing Equation Error
* Spectral and Polarization Stereo Methods Using a Single Light Source
* Stereo Error Detection, Correction, and Evaluation
* Stereopsis And Eye-Movement
* Symmetry-Seeking Models and 3D Object Reconstruction
* Three Dimensional Object Representation Revisited
* Tracing Surfaces for Surfacing Traces
* Two Dimensional Optimal Edge Recognition Using Matched and Weiner Filters for Machine Vision
* Two-Dimensional Solution to the Problem of Zero-Crossings and Spatiotemporal Interpolation in Computer and Human Vision, A
* Unified Approach to the Linear Camera Calibration Problem, A
* Unified Perspective on Computational Techniques for the Measurement of Visual Motion, A
* Unsupervised Bayesian Model-Learning with Application to Textured and Polynomial Image Segmentation
* Using a Color Reflection Model to Separate Highlights from Object Color
* What is Regular in Regularization?
* Adaptive Clustering Algorithm for Image Segmentation, An
* Admissibility of Constraint Functions in Relaxation Labeling
* Aligning a Model to an Image Using Minimal Information
* Alignment of Objects with Smooth Surfaces, The
* Analysis of a Sequence of Stereo Scenes Containing Multiple Moving Objects Using Rigidity Constraints
* Application of Qualitative Depth and Shape from Stereo
* Applying Sensor Models to Automatic Generation of Object Recognition Programs
* Aspect Graphs and Nonlinear Optimization in 3-D Object Recognition
* Brightness-Based Stereo Matching
* Color from Black and White
* Color Image Analysis with an Intrinsic Reflection Model
* Color Reflectance Model and Its Use for Segmentation, A
* Combinatorics of Object Recognition in Cluttered Environments Using Constrained Search, The
* Computational Aspects of Determining Optical Flow
* Constraint-Based System for Interpretation of Aerial Imagery, A
* Coping with Discontinuities in Computer Vision: Their Detection, Classification and Measurement
* Creating the Perspective Projection Aspect Graph of Polyhedral Objects
* Creation of Structure in Dynamic Shape, The
* Detecting Specular Reflection Using Lambertian Constraints
* Determining the Optimal Weights in Multiple Objective Function Optimization
* Efficiently Computing and Representing Aspect Graphs of Polyhedral Objects
* Egomotion and the Stabilized World
* Error of Fit Measures for Recovering Parametric Solids
* Estimating Motion from Sparse Range Data Without Correspondence
* Evolution Properties of Space Curve
* Eye Fixation and Early Vision: Kinetic Depth
* Feasibility Of Motion And Structure Computations, The
* Geometric Hashing: A General and Efficient Model-Based Recognition Scheme
* Geometry from Specularities
* How the Delaunay Triangulation Can Be Used for Representing Stereo Data
* Image Description Via the Multiresolution Intensity Axis of Symmetry
* Information in the Geometric Structure of Retinal Flow Fields
* Learnable and Nonlearnable Visual Concepts
* Matching Perspective Images Using Geometric Constraints and Perceptual Grouping
* Measuring Image Flow by Tracking Edge-Lines
* Modal Control of an Attentive Vision System
* Morphological Algorithms for Computing Non-Planar Point Neighborhoods on Cellular Automata
* Morphological Feature Detection
* Motion And Depth From Binocular Orthographic Views
* Motion Coherence Theory, The
* Multigrid Bayesian Estimation of Image Motion Fields Using Stochastic Relaxation
* Multiple Shape-from-Texture into Texture Analysis and Surface Segmentation
* New Model-Based Stereo Approach for 3D Surface Reconstruction Using Contours on the Surface Pattern, A
* Nonlinear Approach to the Motion Correspondence Problem, A
* Novel Approach to Colour Constancy, A
* Occlusion-Sensitive Matching
* On the Congruence of Noisy Images to Line Segment Models
* On the Sensitivity of the Hough Transform for Object Recognition
* Optic Acceleration
* Optimal Computing of Structure from Motion Using Point Correspondences in Two Frames
* Optimal Corner Detector
* Optimal Morphological Approaches to Image Matching and Object Detection
* Organization of Curve Detection: Coarse Tangent Fields and Fine Spline Coverings, The
* Organization of Smooth Image Curves at Multiple Scales
* Parallel Depth Recovery by Changing Camera Parameters
* Parallel Optical Flow Using Local Voting
* Perceiving Structure from Motion: Failure of Shape Constancy
* Polynomial Methods for Structure from Motion
* Pyramid Implementation of Optimal Step Conjugate Search Algorithms for Some Computer Vision Problems
* Recognize the Similarity Between Shapes Under Affine Transformation
* Recognizing 3-D Objects Using Surface Descriptions
* Reconstruction of Consistent Shape from Inconsistent Data: Optimization of 2.5D Sketches
* Reconstruction of Surfaces of 3-D Objects by M-array Pattern Projection Method
* Recovering Shape Deformation by an Extended Circular Image Representation
* Representing Oriented Piecewise C^2 Surfaces
* Robust Depth Estimation from Optical Flow
* Robust Window Operators
* Self-Calibration of Stereo Cameras
* Shape From Angles Under Perspective Projection
* Shape Information from Shading: A Theory about Human Perception
* Singularities of Principal Direction Fields from 3-D Images
* Space-Time Sampling with Motion Uncertainty: Constraints on Space-Time Filtering
* Structural Saliency: The Detection of Globally Salient Structures Using a Locally Connected Network
* Structure and Motion from Two Perspective Views Via Planar Patch
* Surface Reconstruction by Dynamic Integration of Focus, Camera Vergence, and Stereo
* Surface Reconstruction from Image Sequences
* Synergistic Smooth Surface Stereo
* System for 3-D Workpieces Recognition, A
* Temporal Edges: The Detection of Motion and the Computation of Optical Flow
* Theory on Optical Velocity Fields and Ambiguous Motion of Curves, A
* Towards Real-Time Trinocular Stereo
* Towards the Automatic Generation of Recognition Strategies
* Translating Optical Flow into Token Matches and Depth from Looming
* Trinocular General Support Algorithm: A Three-Camera Stereo Algorithm for Overcoming Binocular Matching Errors, The
* Two-View Matching
* Unsupervised Bayesian Estimation for Segmenting Textured Images
* Using Dynamic Programming for Minimizing the Energy of Active Contours in the Presence of Hard Constraints
* Using Flow Field Divergence for Obstacle Avoidance: Towards Qualitative Vision
* Using Symmetries for Analysis of Shape from Contour
* 2.1-D Sketch, The
* 3D Structure from a Monocular Sequence of Images
* About Lacunarity, Some Links Between Fractal and Integral Geometry and an Application to Texture Segmentation
* Accurate Corner Detection: An Analytical Study
* Active 3D Object Models
* Active Surface Reconstruction by Integrating Focus, Vergence, Stereo, and Camera Calibration
* Analysis of facial images using physical and anatomical models
* Appearance-Model-Based Representation and Matching of 3-D Objects
* approach to 3D scene reconstruction from noisy binocular image sequences using information fusion, An
* Approach to Color Constancy Using Multiple Images, An
* Bayesian Decision Theoretic Approach for Adaptive Goal-Directed Sensing, A
* BONSAI: 3D Object Recognition Using Constrained Search
* Calculating Surface Reflectance Using a Single-Bounce Model of Mutual Reflection
* Collaboration between computer graphics and computer vision
* Compact Image Representation from Multiscale Edges
* computational model for face location, A
* Computing Optical Flow from an Overconstrained System of Linear Algebraic Equations
* Computing Spatiotemporal Surface Flow
* Computing the Visual Potential of an Articulated Assembly of Parts
* Computing Two Motions from Three Frames
* Cooperative Integration of Multiple Stereo Algorithms
* Curved Inertia Frames and the Skeleton Sketch: Finding Salient Frames of Reference
* Decomposition Theory and Transformations of Visual Directions
* Description and Reconstruction from Image Trajectories of Rotational Motion
* Detecting and Localizing Edges Composed of Steps, Peaks and Roofs
* Detecting Height from Constrained Motion
* Detection of Convex and Concave Discontinuous Points in a Plane Curve
* Detection of Interest Points Using Symmetry
* Determining Back-Facing Curved Model Surfaces by Analysis at the Boundary
* Determining Camera Rotation from Vanishing Points of Lines on Horizontal Planes
* Determining Reflectance Parameters Using Range and Brightness Images
* Direct Computation of Qualitative 3-D Shape and Motion Invarients
* Direct Estimation of Deformable Motion Parameters from Range Image Sequence
* Direct Motion Stereo: Recovery of Observer Motion and Scene Structure
* Direct Recovery of Motion and Shape in the General Case by Fixation
* Dynamic 3D Models with Local and Global Deformations: Deformable Superquadrics
* Dynamic Analysis of Apparent Contours, The
* Dynamic Depth Extraction Method, A
* Dynamic Edge Warping: Experiments in Disparity Estimation under Weak Constraints
* Dynamic Integration of Height Maps into a 3D World Representation from Range Image Sequences
* Dynamic Stereo with Self-Calibration
* Effect of Indexing on the Complexity of Object Recognition, The
* Efficient Method For Multiple-Circle Detection, An
* Epicardial motion and deformation estimation from coronary artery bifurcation points
* Estimation of Shape, Reflection Coefficients and Illuminant Direction from Image Sequences
* Estimation-Theoretic Framework for Image-Flow Computation, An
* Evolution and Testing of a Model-Based Object Recognition System, The
* Extended Structure and Motion Analysis from Monocular Image Sequences
* Fast Algorithm for Active Contours, A
* Feature Matching for Object Localization in the Presence of Uncertainty
* Feed-Forward Recovery of Motion and Structure from a Series of 2D-Lines Matches
* Finite Element Method Applied to New Active Contour Models and 3D Reconstruction from Cross Sections, A
* Framework for Adaptive Scale Space Tracking Solutions to Problems in Computational Vision, A
* From Uncertainty to Visual Exploration
* Geometrical Learning from Multiple Stereo Views Through Monocular Based Feature Grouping
* Hypothesis Testing: A Framework for Analysing and Optimising Hough Transform Performance
* Hypothesizing and Testing Geometric Attributes of Image Data
* Image Interpretation Using Multi-Relational Grammars
* Indexing Via Color Histograms
* Integrated Treatment of Matching and Measurement Errors for Robust Model-Based Motion Tracking
* Integration of Region and Edge-Based Segmentation, The
* Interesting Patterns for Model-Based Machine Vision
* Interpolating Cubic Spline Contours by Minimizing Second Derivative Discontinuity
* Invariance: A New Framework for Vision
* Learning 3D Object Recognition Strategies
* Local Spatial Frequence Analysis for Computer Vision
* Locally Adaptive window for Signal Matching, A
* Matching range images of human faces
* Model for the Detection of Motion over Time, A
* Modeling the Rim Appearance
* MRF Model-Based Segmentation of Range Images
* Multiple Light Source Optical Flow
* Multiple Widths Yield Reliable Finite Differences
* Multiresolution Image Acquisition and Surface Reconstruction
* Multispectral Constraints for Optical Flow Computation
* New Transform For Curve Detection, A
* Object Recognition by a Hopfield Neural Network
* Object Recognition Using a Feature Search Strategy Generated from a 3-D Model
* Occlusion Detection in Early Vision
* Omni-directional Stereo for Making Global Map
* On the Extensive Reconstruction of Hough Transforms
* On the Sensitivity of Geometric Hashing
* Parallel Structure Recognition with Uncertainty: Coupled Segmentation and Matching
* Perceptual Organization of Occluding Contours
* Photometric Invariant and Shape Constraints at Parabolic Points, A
* Photometric Motion
* Pose Determination from Line-to-Plane Correspondences: Existence Condition and Closed-Form Solutions
* Qualitative 3-D Shape Reconstruction Using Distributed Aspect Graph Matching
* Qualitative Route Scene Description Using Autonomous Landmark Detection
* Recognition-Based Reconstruction of an Indoor Scene Using an Integration of Active and Passive Sensing Techniques
* Reconstructing 3D Lines from a Sequence of 2D Projections: Representation and Estimation
* Reconstructing Line Drawings from Wings: the Polygonal Case
* Reconstruction without Discontinuities
* Recovering 3D Motion and Structure from Stereo and 2D Token Tracking Cooperation
* Representation and the Dimensions of Shape Deformation
* Representing Surface Curvature Discontinuities on Curved Surfaces
* Robust Curve Detection by Temporal Geodesics
* Robustness of Correspondence-Based Structure from Motion
* Scale Detection and Region Extraction from a Scale-Space Primal Sketch
* Segmentation as the Search for the Best Description of Images in Terms of Primitives
* Segmentation by Minimal Description
* Segmentation of 3-D Range Images Using Pyramidal Data Structures
* Segmentation of Optical Flow and 3D Data for the Interpretation of Mobile Objects
* Segmenting Curves into Elliptic Arcs and Straight Lines
* Sensitivity of the Pose Refinement Problem to Accurate Estimation of Camera Parameters
* Shape and Motion without Depth
* Shape from Contour: Straight Homogeneous Generalized Cones
* Shape from Interreflections
* Shape from Texture: The Homogeneity Hypothesis
* Similarity Extraction and Modelling
* Simple Method for Computing 3D Motion and Depth
* Simultaneous Estimation of Shape and Reflectance Map from Photometric Stereo
* Steerable Filters for Early Vision, Image Analysis and Wavelet Decomposition
* Surface Reconstruction Using Deformable Models with Interior and Boundary Constraints
* Surface Shape Reconstruction Of An Undulating Transparent Object
* Temporally Integrated Surface Reconstruction
* Terrain Matching by Analysis of Aerial Images
* Theory of Image Matching, A
* Toward the Automating of Mathematical Morphology Procedures Using Predicate Logic
* Towards A Computational Theory Of Model Based Vision And Perception
* Tracking and Grouping 3D Line Segments
* Uncertainty in Interpretation of Range Imagery
* Understanding Assembly Illustrations in An Assembly Manual Without Any Model of Mechanical Parts
* Uniqueness, the Minimum Norm Constraint, and Analog Networks for Optical Flow Along Contours
* Vanishing Point Calculation as a Statistical Inference on the Unit Sphere
* Viewpoint Invariant Recovery
* 2-D digital curve analysis: A regularity measure
* 3D Object Recognition by Indexing Structural Invariants from Multiple Views
* Accurate Line Parameters from an Optimising Hough Transform for Vanishing Point Detection
* Active and intelligent sensing of road obstacles: Application to the European Eureka-PROMETHEUS project
* Active Exploration: Knowing When We're Wrong
* Affine-Invariant Contour Tracking with Automatic Control of Spatiotemporal Scale
* All-Transputer Visual Autobahn-Autopilot/Copilot, An
* Automatic Feature Point Extraction and Tracking in Image Sequences for Unknown Camera Motion
* Binocular Stereo Algorithm for Reconstructing Sloping, Creased, and Broken Surfaces in the Presence of Half-Occlusion, A
* Building and Using Flexible Models Incorporating Grey-Level Information
* Combining Stereo and Motion Analysis for Direct Estimation of Scene Structure
* Complete Two-Plane Camera Calibration Method and Experimental Comparisons, A
* Computation of Ego-Motion and Structure from Visual an Inertial Sensor Using the Vertical Cue
* Computational Model of Neural Contour Processing: Figure-Ground Segregation and Illusory Contours, A
* Contextual feature similarities for model-based object recognition
* Cooperation of visually guided behaviors
* Critical Sets of Lines for Camera Displacement Estimation: A Mixed Euclidean-Projective and Constructive Approach, The
* Design for a Visual-Motion Transducer, A
* Diagonal Transforms Suffice for Color Constancy
* Diffuse Shading, Visibility Fields, and the Geometry of Ambient Light
* Direct Estimation of Multiple Disparities for Transparent Multiple Surfaces in Binocular Stereo
* Discrete Models for Energy-Minimizing Segmentation
* Distance accumulation and planar curvature
* Dynamic Calibration of an Active Stereo Head
* Dynamic Fixation
* Egomotion Analysis Based on the Frenet-Serret Motion Model
* Eliciting qualitative structure from image curve deformations
* Enhanced Image Capture Through Fusion
* Estimation of the Light Source Distribution and its Use in Integrated Shape Recovery from Stereo and Shading
* Euclidian Constraints for Uncalibrated Reconstruction
* Experiments with monocular visual tracking and environment modeling
* Exploiting the Generic View Assumption to Estimate Scene Parameters
* Extension of Marr's Signature Based Edge Classification and Other Methods Determining Diffuseness and Height of Edges, and Bar Edge Width, An
* Extracting Projective Structure from Single Perspective Views of 3D Point Sets
* Fast and Robust 3D Recognition by Alignment
* Fast Segmentation, Tracking, and Analysis of Deformable Objects
* Finite Element Model for 3D Shape Reconstruction and Nonrigid Motion Tracking, A
* Framework for the Robust Estimation of Optical Flow, A
* Generalized Brightness Change Model for Computing Optical Flow, A
* Global Algorithm for Shape from Shading, A
* Grasping Visual Symmetry
* Head-Centered Orientation Strategies in Animate Vision
* Improved Algorithm for Algebraic Curve and Surface Fitting, An
* Incremental Image Sequence Enhancement with Implicit Motion Compensation
* Integration of Quantitative and Qualitative Techniques for Deformable Model Fitting from Orthographic, Perspective, and Stereo Projections
* Interpretation of Natural Scenes Using Multi-Parameter Default Models and Qualitative Constraints
* Large deformable splines, crest lines and matching
* Learning Object Recognition Models from Images
* Learning Recognition and Segmentation of 3-D Objects from 2-D Images
* Linear and Incremental Acquisition of Invariant Shape Models from Image Sequences
* Linear Complexity Procedure for Labelling Line Drawings of Polyhedral Scenes Using Vanishing Points, A
* Localization Using Combinations of Model Views
* Looking for Trouble: Using Causal Semantics to Direct Focus of Attention
* Mathematical morphology: The Hamilton-Jacobi connection
* Minimum Description Length Based 2-D Shape Description
* Modal Framework for Correspondence and Description, A
* Motion Detection Robust to Perturbations: A Statistical Regularization and Temporal Integration Framework
* Motion Segmentation and Local Structure
* Multi-Scale Vector-Ridge-Detection for Perceptual Organization Without Edges
* Multiple knowledge sources and evidential reasoning for shape recognition
* Multiscale Markov Random Field Models for Parallel Image Classification
* Note on Existence and Uniqueness in Shape from Shading, A
* Occam Algorithms for Computing Visual-Motion
* Optical Flow from 1-D Correlation: Application to a Simple Time-to-Crash Detector
* Optimal Estimation of Object Pose from a Single Perspective View
* Perceptually Plausible Model for Global Symmetry Detection, A
* Probabilistic Relaxation for Matching Problems in Computer Vision
* Projective Depth: A Geometric Invariant for 3D Reconstruction from Two Perspective/Orthographic Views and for Visual Recognition
* Projectively Invariant Decomposition and Recognition of Planar Shapes
* Quantitative Analysis of the Viewpoint Consistency Constraint in Model-Based Vision
* Quantitative Methodology for Analyzing the Performance of Detection Algorithms, A
* Reactions to Peripheral Image Motion Using a Head/Eye Platform
* Reciprocal-Wedge Transform for Space-Variant Sensing
* Recognition of Object Classes from Range Data
* Recognizing Algebraic Surfaces from Their Outlines
* Recognizing mice, vegetables and hand printed characters based on implicit polynomials, invariants and Bayesian methods
* Recovering Reflectance and Illumination in a World of Painted Polyhedra
* Reflectance Ratio: A Photometric Invariant for Object Recognition
* Relative 3D Positioning and 3D Convex Hull Computations from a Weakly Calibrated Stereo Pair
* Relative Depth from Vergence Micromovements
* Renormalization for unbiased estimation
* Robust 3-D 3-D Pose Estimation
* robust active contour model with insensitive parameters, A
* Robust Computation of Optical-Flow in a Multiscale Differential Framework
* Robust Line-Based Pose Estimation from a Single Image
* Robust Structure from Motion using Motion Parallax
* Robust Vergence with Concurrent Detection of Occlusion and Specular Highlights
* Segmentation and 2D Motion Estimation by Region Fragments
* Shape from Texture from a Multi-Scale Perspective
* Silhouette-Based Object Recognition through Curvature Scale Space
* Spherical Representation for the Recognition of Curved Objects, A
* Stereo Matching, Reconstruction and Refinement of 3D Curves Using Deformable Contours
* Surface discontinuities in range images
* system for automatic vectorization and interpretation of map-drawings, A
* Texture discrimination by local generalized symmetry
* Tracking Foveated Corner Clusters Using Affine Structure
* Tracking Non-Rigid Objects in Complex Scenes
* Understanding Noise: The Critical Role of Motion Error in Scene Reconstruction
* Using Hyperquadrics for Shape Recovery from Range Data
* Vision-Based Construction of CAD Models from Range Images
* Visual Echo Analysis
* 3D Human Body Model Acquisition from Multiple Views
* 3D Pose Estimation by Fitting Image Gradients Directly to Polyhedral Models
* 3D Surface Reconstruction from Stereoscopic Image Sequences
* 3D-2D Projective Registration of Free-Form Curves and Surfaces
* Accurate Internal Camera Calibration Using Rotation, with Analysis of Sources of Error
* Active Fixation Using Attentional Shifts, Affine Resampling, and Multiresolution Search
* Active Visual Navigation Using Non-Metric Structure
* Adaptive Model Evolution Using Blending
* Affine Surface Reconstruction by Purposive Viewpoint Control
* Algorithms for Implicit Deformable Models
* Alignment by Maximization of Mutual Information
* Analytical and Experimental Study of the Performance of Markov Random-Fields Applied to Textured Images Using Small Samples, An
* Animat Vision: Active Vision in Artificial Animals
* Annular Symmetry Operators: A Method for Locating and Describing Objects
* ASSET-2: Real-Time Motion Segmentation and Shape Tracking
* Automatic Recognition of Human Facial Expressions
* Auxiliary Variables for Deformable Models Computer Vision Problems
* Bayesian Decision Theory, the Maximum Local Mass Estimate
* Better Optical Triangulation Through Spacetime Analysis
* Calibration-Free Visual Control Using Projective Invariance
* Class-Based Grouping in Perspective Images
* Closed-World Tracking
* Closing the Loop on Multiple Motions
* Color Constancy In Diagonal Chromaticity Space
* Color Constancy Under Varying Illumination
* Combining Color and Geometric Information for the Illumination Invariant Recognition of 3D Objects
* Combining Color and Geometry for the Active, Visual Recognition of Shadows
* Comparison of Projective Reconstruction Methods for Pairs of Views, A
* Complete Scene Structure from Four Point Correspondences
* Computation of Coherent Optical Flow by Using Multiple Constraints
* Computing Visual Correspondence: Incorporating the Probability of a False Match
* Cosmos: A Representation Scheme for Free-Form Surfaces
* Curve and Surface Smoothing Without Shrinkage
* Deformable Velcro(TM) Surfaces
* Detecting Kinetic Occlusion
* Determining Wet Surfaces from Dry
* Direct Estimation of Affine Deformations Using Visual Front-End Operators with Automatic Scale Selection
* Dynamic Rigid Motion Estimation from Weak Perspective
* Electronically Directed Focal Stereo
* Elimination: An Approach to the Study of 3D-from-2D
* Epipole and Fundamental Matrix Estimation Using Virtual Parallax
* Estimating Motion and Structure from Correspondences of Line Segments between Two Perspective Images
* Estimating the Tensor of Curvature of a Surface from a Polyhedral Approximation
* Expected Performance of Robust Estimators Near Discontinuities
* Face Detection by Fuzzy Pattern Matching
* Face Recognition from One Example View
* Facial Expression Recognition Using a Dynamic Model and Motion Energy
* Fast Object Recognition in Noisy Images Using Simulated Annealing
* Finding Faces in Cluttered Scenes Using Labelled Random Graph Matching
* FORMS: A Flexible Object Recognition and Modelling System
* Gabor Wavelets for 3-D Object Recognition
* Geodesic Active Contours
* Geometric Criterion for Shape-Based Non-Rigid Correspondence, A
* Global Rigidity Constraints in Image Displacement Fields
* Gradient Flows and Geometric Active Contour Models
* Head-Eye Calibration
* Hierarchical Statistical Models for the Fusion of Multiresolution Image Data
* Hypergeometric Filters for Optical Flow and Affine Matching
* Illumination-Invariant Recognition of Texture in Color Images
* Image Segmentation by Reaction-Diffusion Bubbles
* Improving Laser Triangulation Sensors Using Polarization
* In Defence of the 8-Point Algorithm
* Indexing Visual Representations Through the Complexity Map
* Integral Approach to Free-Form Object Modeling, An
* Integrated Stereo-Based Approach to Automatic Vehicle Guidance, An
* Invariant of a Pair of Non-Coplanar Conics in Space: Definition, Geometric Interpretation and Computation
* Invariant-Based Recognition of Complex Curved 3-D Objects from Image Contours
* Layered Representation of Motion Video Using Robust Maximum-Likelihood Estimation of Mixture Models and MDL Encoding
* Learning Geometric Hashing Functions for Model Based Object Recognition
* Learning-Based Hand Sign Recognition Using SHOSLIF-M
* Linear Method for Reconstruction from Lines and Points, A
* Locating Objects Using the Hausdorff Distance
* Matching Constraints and the Joint Image
* Matching of 3D Curves Using Semi-Differential Invariants
* Model of Figure-Ground Segregation from Kinetic Occlusion, A
* Model-Based 2D and 3D Dominant Motion Estimation for Mosaicing and Video Representation
* Model-Based Integrated Approach to Track Myocardial Deformation Using Displacement and Velocity Constraints, A
* Model-Based Matching of Line Drawings by Linear Combinations of Prototypes
* Model-Based Tracking of Self-Occluding Articulated Objects
* Monocular Tracking of the Human Arm in 3D
* Mosaic Based Representations of Video Sequences and Their Applications
* Motion Analysis with a Camera with Unknown, and Possibly Varying Intrinsic Parameters
* Motion Estimation With Quadtree Splines
* Motion from the Frontier of Curved Surfaces
* Multi-Body Factorization Method for Motion Analysis, A
* Multibaseline Stereo System with Active Illumination and Real-Time Image Acquisition, A
* Multiscale Detection of Curvilinear Structures in 2D and 3D Image Data
* Nonlinear Manifold Learning for Visual Speech Recognition
* Nonparametric Approach for Camera Calibration, The
* Object Indexing Using an Iconic Sparse Distributed Memory
* Object Pose: Links Between Paraperspective and Perspective
* On Multi-Feature Integration for Deformable Boundary Finding
* On Representation and Matching of Multi-Coloured Objects
* On the Geometry and Algebra of the Point and Line Correspondences between N Images
* Optical Flow and Deformable Objects
* Optimal RBF Networks for Visual Learning
* Optimal Subpixel Matching of Contour Chains and Segments
* Perceptual Organization in an Interactive Sketch Editing Application
* Polymorphic Grouping for Image Segmentation
* Probabilistic 3D Object Recognition
* Probabilistic Visual Learning for Object Detection
* Quantitative Analysis of View Degeneracy and its Use for Active Focal Length Control, A
* Real-Time Focus Range Sensor
* Real-Time Obstacle Avoidance Using Central Flow Divergence and Peripheral Flow
* Real-Time X-Ray Inspection of 3D Defects in Circuit Board Patterns
* Recognition of Human Body Motion Using Phase Space Constraints
* Recognition Using Region Correspondences
* Recognizing 3D Objects Using Photometric Invariant
* Reconstructing Complex Surfaces from Multiple Stereo Views
* Reconstruction from Image Sequences by Means of Relative Depths
* Recovering 3D Motion and Structure of Multiple Objects Using Adaptive Hough Transform
* Recovering Object Surfaces from Viewed Changes in Surface Texture Patterns
* Recursive Filter for Phase Velocity Assisted Shape-Based Tracking of Cardiac Non-Rigid Motion, A
* Reflectance Function Estimation and Shape Recovery from Image Sequence of a Rotating Object
* Region Competition: Unifying Snakes, Region Growing, and Bayes/MDL for Multiband Image Segmentation
* Region Correspondence by Inexact Attributed Planar Graph Matching
* Region Tracking Through Image Sequences
* Relational Matching with Dynamic Graph Structures
* Rendering Real-World Objects Using View Interpolation
* Results Using Random Field Models for the Segmentation of Color Images
* Rigid-Body Segmentation and Shape-Description from Dense Optical-Flow Under Weak Perspective
* Rigidity Checking of 3D Point Correspondences Under Perspective Projection
* Robot Aerobics: Four Easy Steps to a More Flexible Calibration
* Robot System That Observes and Replicates Grasping Tasks, A
* Robust Detection of Degenerate Configurations for the Fundamental Matrix
* Robust Real Time Tracking and Classification of Facial Expressions
* Saliency Maps and Attention Selection in Scale and Spatial Coordinates: An Information Theoretic Approach
* Scale-Space from Nonlinear Filters
* Seeing Behind the Scene: Analysis of Photometric Properties of Occluding Edges by the Reversed Projection Blurring Model
* Segmented Shape Description from 3-View Stereo
* Shape And Model From Specular Motion
* Shape Extraction for Curves Using Geometry-Driven Diffusion and Functional Optimization
* Shape from Shading with Interreflections Under Proximal Light Source: 3D Shape Reconstruction of Unfolded Book Surface from a Scanner Image
* Site Model Acquisition and Extension from Aerial Images
* Snake for Model-Based Segmentation, A
* State Based Technique for the Summarization and Recognition of Gesture, A
* Statistical Learning, Localization, and Identification of Objects
* Steerable Wedge Filters
* Stereo in the Presence of Specular Reflection
* Stochastic Completion Fields: A Neural Model of Illusory Contour Shape and Salience
* Structure and Motion Estimation from Dynamic Silhouettes under Perspective Projection
* Structure and Semi-Fluid Motion Analysis of Stereoscopic Satellite Images for Cloud Tracking
* Surface Geometry from Cusps of Apparent Contours
* Surface Orientation and Curvature from Differential Texture Distortion
* Surface Reconstruction: GNCs and MFA
* Task-Oriented Generation of Visual Sensing Strategies
* Texture Segmentation and Shape in the Same Image
* Theory of Specular Surface Geometry, A
* Topologically Adaptable Snakes
* Towards a Unified IU Environment: Coordination of Existing IU Tools with the IUE
* Towards an Active Visual Observer
* Tracking and Recognizing Rigid and Non-Rigid Facial Motions Using Local Parametric Models of Image Motion
* Transfer of Fixation for an Active Stereo Platform via Affine Structure Recovery
* Trilinearity of Three Perspective Views and its Associated Tensor
* Unified Approach for Coding and Interpreting Face Images, A
* Unifying Framework for Structure and Motion Recovery from Image Sequences, A
* Unsupervised Parallel Image Classification Using a Hierarchical Markovian Model
* Validation of 3D Registration Methods Based on Points and Frames
* Vision Based Hand Modeling and Tracking for Virtual Teleconferencing and Telecollaboration
* Visual Navigation Using a Single Camera
* Volumetric Deformable Models with Parameter Functions: A New Approach to the 3D Motion Analysis of the LV from MRI-SPAMM
* Weakly-Calibrated Stereo Perception for Rover Navigation
* 2D Affine Transformations Cannot Account for Human 3D Object Recognition
* 3D Modeling of Human Lip Motions
* 3D Photography on Your Desk
* 3D Point Distribution Models of the Cortical Sulci
* 3D Reconstruction with Projective Octrees and Epipolar Geometry
* 3D Shape and Motion Analysis from Image Blur and Smear: A Unified Approach
* 3D Shape Reconstruction Using Volume Intersection Techniques
* Accurate, Real-Time, Unadorned Lip Tracking
* Achieving a Fitts Law Relationship for Visual Guided Reaching
* Acquiring 3D Object Models from Specular Motion Using Circular Lights Illumination
* Active Blobs
* Affine Invariant Medial Axis and Skew Symmetry
* Affine Reconstruction of Curved Surfaces from Uncalibrated Views of Apparent Contours
* Agent Orientated Annotation in Model Based Visual Surveillance
* Ambiguity in Reconstruction from Images of Six Points
* ASL Recognition Based on a Coupling Between HMMs and 3D Motion Analysis
* Automatic Generation of Robot Program Code: Learning from Perceptual Data
* Automatic Model Construction, Pose Estimation, and Object Recognition from Photographs using Triangular Splines
* Automatic Registration of 3-D Ultrasound Images
* Automatic Tracking of Human Motion in Indoor Scenes Across Multiple Synchronized Video Streams
* Bias-Corrected Optical Flow Estimation for Road Vehicle Tracking
* Bilateral Filtering for Gray and Color Images
* Bilinear Voting
* Building Qualitative Event Models Automatically from Visual Input
* Cascaded Hough Transform as an Aid in Aerial Image Interpretation, A
* Chromaticity Space for Specularity-, Illumination Color- and Illumination Pose-Invariant 3-D Object Recognition, A
* Color Recognition in Outdoor Images
* Color- and Texture-Based Image Segmentation Using the Expectation-Maximization Algorithm and its Application to Content-Based Image Retrieval
* Comparing and Evaluating Interest Points
* Comparing Curved-Surface Range Image Segmenters
* Computing Ritz Approximations of Primary Images
* Condensing Image Databases when Retrieval is Based on Non-Metric Distances
* Consensus Surfaces for Modeling 3D Objects from Multiple Range Images
* Constructing Virtual Worlds Using Dense Stereo
* Construction and Refinement of Panoramic Mosaics with Global and Local Alignment
* Contagion-Based Image Segmentation and Labeling
* Cooperative Framework for Segmentation Using 2D Active Contours and 3D Hybrid Models as Applied to Branching Cylindrical Structures, A
* Cubist Approach to Object Recognition, A
* Curvature-Based Approach to Contour Motion Estimation, A
* Deformable Model-Based Shape and Motion Analysis from Images using Motion Residual Error
* Depth Discontinuities by Pixel-to-Pixel Stereo
* Design of Multiparameter Steerable Functions Using Cascade Basis Reduction
* Detecting Changes in Aerial Views of Man-Made Structures
* Ego-Motion and Omnidirectional Cameras
* Egomotion Estimation Using Log-Polar Images
* Error-Tolerant Visual Planning of Planar Grasp
* Estimation with Bilinear Constraints in Computer Vision
* Euclidean Structure from Uncalibrated Images Using Fuzzy Domain Knowledge: Application to Facial Images Synthesis
* Face Surveillance
* Fast and Robust Approach for Registration of Partially Overlapping Range Images, A
* Fast Stereovision with Subpixel-Precision
* Finding Faces in Photographs
* Finding Periodicity in Space and Time
* Finding the Epipole from Uncalibrated Optical Flow
* Fish-Scales: Representing Fuzzy Manifolds
* Framework for Modeling Appearance Change in Image Sequences, A
* From Projective to Euclidean Space Under any Practical Situation, a Criticism of Self-Calibration
* General Framework for Object Detection, A
* Geotensity: Combining Motion and Lighting for 3D Surface Reconstruction
* GRADE: Gibbs Reaction and Diffusion Equation
* Grouping Based on Projective Geometry Constraints and Uncertainty
* Human Face Recognition: A Minimal Evidence Approach
* Hyperbolic Smoothing of Shapes
* Illumination-Invariant Color Object Recognition via Compressed Chromaticity Histograms of Color-Channel-Normalized Images
* Image Indexing using Composite Color and Shape Invariant Features
* Independent 3D Motion Detection Using Residual Parallax Normal Flow Fields
* Indexing Images by Trees of Visual Content
* Information-Conserving Object Recognition
* Initialization of Deformable Models from 3D Data
* Integrated Surface, Curve and Junction Inference from Sparse 3-D Data Sets
* Intensity and Feature Based Stereo Matching by Disparity Parametrization
* Iterative Multi-Step Explicit Camera Calibration
* Learned Temporal Models of Image Motion
* Learning to Identify and Track Faces in Image Sequences
* Linear N >=4-Point Pose Determination
* Local Scale Controlled Anisotropic Diffusion with Local Noise Estimate for Image Smoothing and Edge Detection
* Local Symmetries of Shapes in Arbitrary Dimension
* Maintaining Multiple Motion Model Hypotheses over Many Views to Recover Matching and Structure
* Maximum-Flow Formulation of the N-Camera Stereo Correspondence Problem, A
* Metric for Distributions with Applications to Image Databases, A
* Minimizing Algebraic Error in Geometric Estimation Problems
* Mixed-State Condensation Tracker with Automatic Model-Switching, A
* Mixtures of Eigen Features for Real-Time Structure from Texture
* Model Selection and Surface Merging in Reconstruction Algorithms
* Modeling Geometric Structure and Illumination Variation of a Scene from Real Images
* Morphological Corner Detection
* Motion Estimation in Image Sequences Using the Deformation of Apparent Contours
* Motion Segmentation and Tracking Using Normalized Cuts
* Multidimensional Morphable Models
* Multigrid Approach for Hierarchical Motion Estimation, A
* Multiscale Annealing for Real-Time Unsupervised Texture Segmentation
* Nonlinear Method for Estimating the Projective Geometry of Three Views, A
* Object Tracking Using Deformable Templates
* Optical Flow Estimation Using Wavelet Motion Model
* Optimal Polyline Tracking for Artery Motion Compensation in Coronary Angiography
* Optimal Recovery of Depth from Defocused Images Using an MRF Model
* Parameterized Image Varieties: A Novel Approach to the Analysis and Synthesis of Image Sequences
* Parameterized Modeling and Recognition of Activities
* Passive Depth from Defocus Using a Spatial Domain Approach
* PDE-Based Level-Set Approach for Detection and Tracking of Moving Objects, A
* Physics-based 3D Position Analysis of a Soccer Ball from Monocular Image Sequences
* PIMs and Invariant Parts for Shape Recognition
* Plenoptic Image Editing
* Probabilistic Contour Discriminant for Object Localisation, A
* Probabilistic Framework for Edge Detection and Scale Selection, A
* Quadric Reconstruction from Dual-Space Geometry
* Reading between the Lines: A Method for Extracting Dynamic 3D with Texture
* Real-Time Algorithm for Medical Shape Recovery, A
* Recognition and Interpretation of Parametric Gesture
* Recognition of 3D Free-Form Objects Using Segment-Based Stereo Vision
* Recognition of Plane Projective Symmetry
* Recognizing Novel 3-D Objects Under New Illumination and Viewing Position Using a Small Number of Examples
* Recovering Epipolar Geometry by Reactive Tabu Search
* Relational Histograms for Shape Indexing
* Representation and Self-Similarity of Shapes
* Resolution-Appropriate Shape Representation
* Retrieving Images by Appearance
* Robotic Control with Partial Visual Information
* Robust Computation and Parametrization of Multiple View Relations
* Robust Contour Tracking in Echocardiographic Sequences
* Robust Multi-Sensor Image Alignment
* Robust Tracking with Spatio-Velocity Snakes: Kalman Filtering Approach
* Schwarz Representation for Matching and Similarity Analysis
* Sectored Snakes: Evaluating Learned-Energy Segmentations
* Segmentation of Range Data into Rigid Subsets Using Surface Patches
* Segmenting Cortical Gray Matter for Functional MRI Visualization
* Self-Calibrating a Stereo Head: An Error Analysis in the Neighbourhood of Degenerate Configurations
* Self-Calibration and Euclidean Reconstruction Using Motions of a Stereo Rig
* Self-Calibration and Metric Reconstruction in Spite of Varying and Unknown Internal Camera Parameters
* Self-Calibration from Image Derivatives
* Separability of Pose and Expression in Facial Tracing and Animation
* Separation of Transparent Layers using Focus
* Shading Primitives: Finding Folds and Shallow Grooves
* Shape Recovery Using Dynamic Subdivision Surfaces
* Shock Graphs and Shape Matching
* Signfinder: Using Color to Detect, Localize and Identify Informational Signs
* Snake Pedals: Geometric Models with Physics-Based Control
* Spatial Color Indexing and Applications
* State Space Construction for Behavior Acquisition in Multi Agent Environments with Vision and Action
* Stereo Depth Estimation: A Confidence Interval Approach
* Stereo Matching with Transparency and Matting
* Stereo with Mirrors
* Task Driven 3D Object Recognition System Using Bayesian Networks, A
* Theory of Catadioptric Image Formation, A
* Three Dimensional MR Brain Segmentation
* Thresholding for Change Detection
* Tracking Meteorological Structures Through Curve Matching Using Geodesic Paths
* Transinformation for Active Object Recognition
* Two-Dimensional Affine Invariants that Distribute Uniformly and Can Be Tuned to any Convex Feature Domain
* Two-Stage Robust Statistical Method for Temporal Registration from Features of Various Type, A
* Understanding Object Motion
* Understanding the Relationship Between the Optimization Criteria in Two-View Motion Analysis
* Unified Factorization Algorithm for Points, Line Segments and Planes with Uncertainty Models, A
* Universal Mosaicing using Pipe Projection
* Using Algebraic Functions of Views for Indexing-Based Object Recognition
* Using Conic Correspondence in Two Images to Estimate the Epipolar Geometry
* Using Expectation-Maximisation to Learn a Dynamical Model for a Tracker from Measurement Sequences
* Utilization of Stereo Disparity and Optical Flow Information for Human Interaction
* View-Based Object Matching
* Visual Homing: Surfing on the Epipoles
* Visual Motion Estimation and Prediction: A Probabilistic Network Model for Temporal Coherence
* Visual Routines for Autonomous Driving
* What Can Projections of Flow Fields Tell Us About Visual Motion
* When is it Possible to Identify 3D Objects From Single Images Using Class Constraints?
* Which Shape from Motion?
