MVA21
* *MVA
* Action Spotting and Temporal Attention Analysis in Soccer Videos
* Adversarial Defense Through High Frequency Loss Variational Autoencoder Decoder and Bayesian Update With Collective Voting
* Analysis of Evaluation Metrics with the Distance between Positive Pairs and Negative Pairs in Deep Metric Learning
* Angular Margin Constrained Loss for Automatic Liver Fibrosis Staging
* Attention Mining Branch for Optimizing Attention Map
* Augmenting Discriminative Correlation Filters with Stereo Blob Tracking for Long-Term Tracking of Underwater Animals
* AVM Image Quality Enhancement by Synthetic Image Learning for Supervised Deblurring
* baseline for semi-supervised learning of efficient semantic segmentation models, A
* Bi-directional Recurrent MVSNet for High-resolution Multi-view Stereo
* Boosting Semi-Supervised Anomaly Detection via Contrasting Synthetic Images
* Content Filtering in Streaming Video Using Domain Adaptation
* Contextual Information based Network with High-Frequency Feature Fusion for High Frame Rate and Ultra-Low Delay Small-Scale Object Detection
* Crack Segmentation for Low-Resolution Images using Joint Learning with Super- Resolution
* Critically Compressed Quantized Convolution Neural Network based High Frame Rate and Ultra-Low Delay Fruit External Defects Detection
* Cut and paste curriculum learning with hard negative mining for point-of-sale systems
* Data Augmentation for Human Motion Prediction
* Distant Bird Detection for Safe Drone Flight and Its Dataset
* Efficient transfer learning for multi-channel convolutional neural networks
* Encoding-free Incrementing Hough Transform for High Frame Rate and Ultra-low Delay Straight-line Detection
* Estimating Contribution of Training Datasets using Shapley Values in Data-scale for Visual Recognition
* Expandable Spherical Projection and Feature Fusion Methods for Object Detection from Fisheye Images
* Facial landmark detection transfer learning for a specific user in driver status monitoring systems
* FBNet: FeedBack-Recursive CNN for Saliency Detection
* Group Activity Recognition Using Joint Learning of Individual Action Recognition and People Grouping
* HMA-Depth: A New Monocular Depth Estimation Model Using Hierarchical Multi-Scale Attention
* Human-Object Interaction Detection with Missing Objects
* Illumination Planning for Measuring Per-Pixel Surface Roughness
* Image Information Assistance Neural Network for VideoPose3D-based Monocular 3D Pose Estimation
* Information Hiding Using a Coded Aperture as a Key
* Japanese Sentence Dataset for Lip- reading
* Joint Learning of Object Detection and Pose Estimation using Augmented Autoencoder
* Learning VAE with Categorical Labels for Generating Conditional Handwritten Characters
* Leveraging Frequency Based Salient Spatial Sound Localization to Improve 360° Video Saliency Prediction
* Live Video Action Recognition from Unsupervised Action Proposals
* Lossless AI: Toward Guaranteeing Consistency between Inferences Before and After Quantization via Knowledge Distillation
* Machine-learning-based Quality-level-estimation System for Inspecting Steel Microstructures
* Model-based Crack Width Estimation using Rectangle Transform
* Multi-Modal Pedestrian Detection with Large Misalignment Based on Modal-Wise Regression and Multi-Modal IoU
* Multi-physical and Temporal Feature Based Self-correcting Approximation Model for Monocular 3D Volleyball Trajectory Analysis
* Multiple Fisheye Camera Calibration and Stereo Measurement Methods for Uniform Distance Errors throughout Imaging Ranges
* Occlusion-Robust 3D Hand Pose Estimation from a Single RGB Image
* On the Influence of Viewpoint Change for Metric Learning
* Open-set Recognition with Supervised Contrastive Learning
* Optical Model for Show-through Cancellation in Ancient Document Imaging with Dark and Bright Mounts, An
* Output augmentation works well without any domain knowledge
* Pix2Point: Learning Outdoor 3D Using Sparse Point Clouds and Optimal Transport
* Position Estimation of Pedestrians in Surveillance Video Using Face Detection and Simple Camera Calibration
* Practical Descattering of Transmissive Inspection Using Slanted Linear Image Sensors
* Predicting Next Local Appearance for Video Anomaly Detection
* Recurrent RLCN-Guided Attention Network for Single Image Deraining
* Relational Subgraph for Graph-based Path Prediction
* ROT-Harris: A Dynamic Approach to Asynchronous Interest Point Detection
* Saliency based Subject Selection for Diverse Image Captioning
* Seeing Farther Than Supervision: Self-supervised Depth Completion in Challenging Environments
* Selecting an Iconic Pose From an Action Video
* Self-Supervised Deep Fisheye Image Rectification Approach using Coordinate Relations
* Semantic Hierarchy Preserving Deep Hashing for Large-Scale Image Retrieval
* Shape from shading and polarization constrained by approximate shape
* Shape-Based Floor Plan Retrieval Using Parse Tree Matching
* Synthetically Generating Motion Blur in a Depth Map from Time-of-Flight Sensors
* Temporal Extension for Encoder-Decoder-based Crowd Counting Approaches
* Understanding the Reason for Misclassification by Generating Counterfactual Images
* Video Summarization With Frame Index Vision Transformer
* Video-Based Camera Localization Using Anchor View Detection and Recursive 3D Reconstruction
* Weakly Supervised Domain Adaptation using Super-pixel labeling for Semantic Segmentation
66 for MVA21
MVA23
* *Most Influential Paper over the Decade Award
* *MVA
* Age Prediction From Face Images Via Contrastive Learning
* ASD-EVNet: An Ensemble Vision Network based on Facial Expression for Autism Spectrum Disorder Recognition
* Automated Identification of Surgical Instruments without Tagging: Implementation in Real Hospital Work Environment
* Automatic Reconstruction of Semantic 3D Models from 2D Floor Plans
* BandRe: Rethinking Band-Pass Filters for Scale-Wise Object Detection Evaluation
* Black-box Adversarial Attack against Visual Interpreters for Deep Neural Networks
* Bottleneck Transformer model with Channel Self-Attention for skin lesion classification
* Can you read lips with a masked face?
* CG-based dataset generation and adversarial image conversion for deep cucumber recognition
* Combining Knowledge Distillation and Transfer Learning for Sensor Fusion in Visible and Thermal Camera-based Person Classification
* Combining Static Specular Flow and Highlight with Deep Features for Specular Surface Detection
* Contrastive Knowledge Distillation for Anomaly Detection in Multi-Illumination/Focus Display Images
* Cross-modal Manifold Cutmix for Self-supervised Video Representation Learning
* Deep Randomized Time Warping for Action Recognition
* Diabetic Retinopathy Grading based on a Sparse Network Fusion of Heterogeneous ConvNeXt Models with Category Attention
* Domain Adaptation from Visible-Light to FIR with Reliable Pseudo Labels
* Dynamic Transfer for Domain Adaptation in Crowd Counting
* Enhancing Retail Product Recognition: Fine-Grained Bottle Size Classification
* Ensemble Fusion for Small Object Detection
* Generalizable Solar Irradiation Prediction using Large Transformer Models with Sky Imagery
* Generalization of pixel-wise phase estimation by CNN and improvement of phase-unwrapping by MRF optimization for one-shot 3D scan
* Grid Sample Based Temporal Iteration and Compactness-coefficient Distance for High Frame and Ultra-low Delay SLIC Segmentation System
* Hardware-Aware Zero-Shot Neural Architecture Search
* Hierarchical Spatio-Temporal Neural Network with Displacement Based Refinement for Monocular Head Pose Prediction
* Human Pose Prediction by Progressive Generation in Multi-scale Frequency Domain
* Hybrid Wheat Head Detection model with Incorporated CNN and Transformer, A
* Image Impression Estimation by Clustering People with Similar Tastes
* Interpreting Art by Leveraging Pre-Trained Models
* Intra-frame Skeleton Constraints Modeling and Grouping Strategy Based Multi-Scale Graph Convolution Network for 3D Human Motion Prediction
* Investigating self-supervised learning for Skin Lesion Classification
* Joint learning of images and videos with a single Vision Transformer
* Joint Learning with Group Relation and Individual Action
* Leveraging Embedding Information to Create Video Capsule Endoscopy Datasets
* Lifelong Change Detection: Continuous Domain Adaptation for Small Object Change Detection in Everyday Robot Navigation
* LOTS: Litter On The Sand dataset for litter segmentation
* Low-Level Feature Aggregation Networks for Disease Severity Estimation of Coffee Leaves
* Malware Detection Using Kernel Constrained Subspace Method
* MFFPN: an Anchor-Free Method for Patent Drawing Object Detection
* Mixed Distillation for Unsupervised Anomaly Detection
* Monocular Blind Spot Estimation with Occupancy Grid Mapping
* MS-VACSNet: A Network for Multi-scale Volcanic Ash Cloud Segmentation in Remote Sensing Images
* Multi-class Semantic Segmentation of Tooth Pathologies and Anatomical Structures on Bitewing and Periapical Radiographs
* Multi-Plane Projection for Extending Perspective Image Object Detection Models to 360° Images
* Multi-Prior Based Multi-Scale Condition Network for Single-Image HDR Reconstruction
* MVA2023 Small Object Detection Challenge for Spotting Birds: Dataset, Methods, and Results
* Object Detection for Embedded Systems Using Tiny Spiking Neural Networks: Filtering Noise Through Visual Attention
* Outline Generation Transformer for Bilingual Scene Text Recognition
* Padding Investigations for CNNs in Scene Parsing Tasks
* PALF: Pre-Annotation and Camera-LiDAR Late Fusion for the Easy Annotation of Point Clouds
* Panoptic Segmentation of Galactic Structures in LSB Images
* QAHOI: Query-Based Anchors for Human-Object Interaction Detection
* QaQ: Robust 6D Pose Estimation via Quality-Assessed RGB-D Fusion
* Quadruped Robot Platform for Selective Pesticide Spraying
* Safe height estimation of deformable objects for picking robots by detecting multiple potential contact points
* Safe Landing Zone Detection for UAVs using Image Segmentation and Super Resolution
* Self-Supervised Pre-Training Boosts Semantic Scene Segmentation on LiDAR data
* Shape Preservation in Image Style Transfer for Gaze Estimation
* Small Object Detection for Birds with Swin Transformer
* Tackling Face Verification Edge Cases: In-Depth Analysis and Human-Machine Fusion Approach
* TinyPedSeg: A Tiny Pedestrian Segmentation Benchmark for Top-Down Drone Images
* TomatoDIFF: On-plant Tomato Segmentation with Denoising Diffusion Models *
* Towards Achieving Lightweight Deep Neural Network for Precision Agriculture with Maize Disease Detection
* Transformer with Task Selection for Continual Learning
* Uncertainty Criteria in Active Transfer Learning for Efficient Video-Specific Human Pose Estimation
* Unsupervised Fall Detection on Edge Devices
* Using Unconditional Diffusion Models in Level Generation for Super Mario Bros
* Video Anomaly Detection Using Encoder-Decoder Networks with Video Vision Transformer and Channel Attention Blocks
* ViTVO: Vision Transformer based Visual Odometry with Attention Supervision
* Weakly-Supervised Deep Image Hashing based on Cross-Modal Transformer
* Word to Sentence Visual Semantic Similarity for Caption Generation: Lessons Learned
* X3D Neural Network Analysis for Runner's Performance Assessment in a Wild Sporting Environment, An
* YOLOv5 with Mixed Backbone for Efficient Spatio-Temporal Hand Gesture Localization and Recognition
74 for MVA23
MVA25
* *MVA
* 3D Object Reconstruction Through Integration of Hyperspectral and RGB-D Imaging
* Advancing Disease Detection Using Deep Learning in Low-Data Environments
* Age Prediction of Komatsuna using Hu Moments with Neural Networks for Small Datasets
* Any-scale Object Detection using Arbitrary-scaled Images
* Automatic Rating Approach Using Machine Learning and Feature Selection for Finger Tapping in MDS-UPDRS Part III, An
* Bidirectional Action Sequence Learning for Long-term Action Anticipation with Large Language Models
* Binned MSE for Imbalanced Dust Density Estimation
* Boosting Small Object Tracking via Collaborative Detection Transformer
* Capturing Fine-Grained Alignments Improves 3D Affordance Detection
* CLIP-Guided Cross-Modal Feature Fusion based Few-Shot Learning for Nighttime Pavement Defect Detection
* Confidence-based Adaptive Weighted Boxes Fusion for Multi-Object Tracking of Small Birds
* Cross-Modal Knowledge Distillation from First-Person Views to Third-Person BEV Maps for Universal Point Goal Navigation
* Data-driven Head Motion Generation through Natural Gaze-Head Coordination
* Decoupled Scale and Appearance for Optimal Deep Diamond ReID
* Detecting Hand-Object Interaction Based on Movements in Hand Surrounding Region
* Detection of Medial Epicondyle Avulsion in Elbow Ultrasound Images via Bone Structure Reconstruction
* DLSF: Dual-Layer Synergistic Fusion for High-Fidelity Image Synthesis
* Domain Generalization of Pathological Image Segmentation by Patch-Level and WSI-Level Contrastive Learning
* Dynamic Age Estimation via Mixture of Experts: Bridging Semantic and Structural Models
* Edge-Augmented HLAC and Gaussian Distribution-Based Weighted Feature Extraction for 1-ms Abnormal Detection System in Logistics
* Efficient Skeleton-Based Action Recognition using Superposed Shape Subspace
* Enhancing Reliability of Medical Image Diagnosis through Top-rank Learning with Rejection Module
* FlowLoss: Dynamic Flow-Conditioned Loss Strategy for Video Diffusion Models
* FMDP: Leveraging a Foundation Model for Dual-Pixel Disparity Estimation
* Gaze Attention Estimation for Medical Environments
* Geometrically Constrained Position Estimation through Low-level Tracking
* IG-ODAM: Instance-Aware Visual Explanations for Object Detection with Integrated Gradients
* Impact of Optical System Size on Robustness in Laser Speckle Authentication
* Intersection-based Ensemble for Small Multi-Object Tracking in Challenging Environments
* IRR-RADA: A Reflection-Aware Saliency Map and Adaptive Curriculum Learning Based Data Augmentation Method for Image Reflection Removal
* Leveraging Masked Feature and Consistency Regularization for Unsupervised Domain Adaptation Based Semi-Supervised Semantic Segmentation
* Lightweight Convolutional Neural Network for Underwater Image Quality Enhancement, A
* Low-Latency Real-Time Audio-Driven Talking Head Generation Based on Future Speech Feature Prediction
* Magic for the Age of Quantized DNNs
* Minimalist Approach to HDR Image Compression with Applications to Low-Light Image Enhancement, A
* MobileSACNet: Lightweight Spectral-Spatial Compression for Hyperspectral Segmentation in Autonomous Driving Systems
* Modality Selection and Skill Segmentation via Cross-Modality Attention
* Modifying Generative Distributions in Latent Diffusion Models to Improve Alignment with Desired Properties
* MoExDA: Domain Adaptation for Edge-based Action Recognition
* Multi-Person Pose Estimation Evaluation Using Optimal Transportation and Improved Pose Matching
* MVA 2025 Small Multi-Object Tracking for Spotting Birds Challenge: Dataset, Methods, and Results
* Noise-based Regularized Training for Diffusion Models
* Object State Recognition in Cooking Videos through End State Frames Analysis
* Parallel Sampling of Diffusion Models on SO(3)
* Point Cloud Edge Extraction Based on 3D Point Separability Filter with Spherical Mask
* Pre-Manipulation Alignment Prediction for Open-Vocabulary Object Manipulation Based on End-Effector Trajectories
* Real-Time Fire Detection Using Hybrid Feature Extraction: Color, Texture, and Motion Analysis
* Revisiting Self-Generating Simple Figure Patterns for Learning Microscopy Image Segmentation
* RGB-Thermal Cooperative Robot Vision Strategy for Multi-Person Tracking in Both Well-Lit and Low-Light Scenes
* Scene Recognition Meets Knowledge Graphs: Enhancing Robustness to Object Diversity
* Self-supervised 3D Image Deburring for Lattice Light Sheet Microscopy
* Semantic Segmentation of iPS Cells: Case Study on Model Complexity in Biomedical Imaging
* ShadowAug: A Multi-Strategy Data Augmentation Method for Image Shadow Removal
* Simple Yet Effective Way to Use Polarimetric Information in Stereo Matching
* Snapshot Hyperspectral Imaging using Petrographic Thin Section
* Statistic Temporal Checking and Depth Layering based Multi-Object Relative Size Estimation from Monocular Video
* Style-Preserving Diffusion for Scene Text Editing
* Supervised Domain Adaptation from Scene Text Recognition for Licence Plate Recognition
* Temporal Conditioning for Realistic Performance Video Generation from Instrumental Sounds
* Transformer-based Visual Grounding with Inter-Modality Cross Attention
* Unsupervised 3D Braided Hair Reconstruction from a Single-View Image
* Very Similar Appearance Feature Classification for Chronic Endometritis Diagnosis in Hysteroscopy Images
* Viewpoint-Aware 3D Dense Captioning
* ZECO: ZeroFusion Guided 3D MRI Conditional Generation
65 for MVA25