2512
* *3D Imaging, Modeling, Processing, Visualization and Transmission
* *
* *Affective Behavior Analysis In-the-Wild
* *AgriVision: Agriculture-Vision: Challenges and Opportunities for Computer Vision in Agriculture
* *AI for Creative Visual Content Generation, Editing and Understanding
* *Autonomous Driving
* *Catch UAVs That Want to Watch You: Detection and Tracking of Unmanned Aerial Vehicle in the Wild
* *Computer Vision for Drug Discovery Where Are We and What is Beyond?
* *Computer Vision for Microscopy Image Analysis
* *Computer Vision in Sports
* *CVPR
* *Data Driven Autonomous Driving Simulation
* *Distillation of Foundation Models for Autonomous Driving
* *Domain Generalization: Evolution, Breakthroughs, and Future Horizons
* *Efficient and On-Device Generation
* *Efficient Large Vision Models
* *Event-Based Vision
* *Explainable AI for Computer Vision Workshop
* *Exploring the Next Generation of Data
* *Federated Learning for Computer Vision
* *Fine-grained Visual Categorization
* *Foundation Models for V2X-Based Cooperative Autonomous Driving
* *Human Motion Generation
* *Image Matching: Local Features and Beyond
* *LatinX in CV Research
* *Mechanistic Interpretability for Vision
* *Mobile AI
* *Monocular Depth Estimation Challenge
* *Multi-Agent Embodied Intelligent Systems Meet Generative-AI Era: Opportunities, Challenges and Futures
* *Multimodal Algorithmic Reasoning Workshop
* *MVA
* *Navigating the Future: Ensuring Trustworthiness in Multi-Modal Open-World Intelligence
* *New Trends in Image Restoration and Enhancement
* *Open-World 3D Scene Understanding
* *Perception Beyond the Visible Spectrum
* *Pixel-level understanding with Vision Foundation Models
* *Pixel-Level Video Understanding in the Wild Challenge
* *Precognition: Seeing Through the Future
* *ReGenAI: Second Workshop on Responsible Generative AI
* *Rhobin Challenge: Reconstruction of Human-Object Interaction
* *Safe Artificial Intelligence for All Domains
* *Sign Language Recognition, Translation and Production
* *SyntaGen: Harnessing Generative Models for Synthetic Visual Datasets
* *Test-time Scaling for Computer Vision
* *Uncertainty Quantification for Computer Vision
* *Urban Scene Modeling: Where Vision Meets Photogrammetry and Graphics
* *Visual Anomaly and Novelty Detection
* *Visual Odometry and Computer Vision Applications Based on Location Clues
* *What is Next in Multimodal Foundation Models?
* *Women in Computer Vision
* *Workshop of Adversarial Machine Learning on Computer Vision: Foundation Models + X
* *Workshop on Foundation and Large Vision Models in Remote Sensing
* 360-GS: Layout-Guided Panoramic Gaussian Splatting for Indoor Roaming
* 3D Face Reconstruction From Radar Images
* 3D Object Reconstruction Through Integration of Hyperspectral and RGB-D Imaging
* 3D Reconstruction with Spatial Memory
* 3D Whole-Body Grasp Synthesis with Directional Controllability
* 3D-GPT: Procedural 3D Modeling with Large Language Models
* 3Diface: Synthesizing and Editing Holistic 3D Facial Animation
* 3rd Multi-Modal Aerial View Image Challenge: Sensor Domain Translation - PBVS 2025
* 4D-Editor: Interactive Object-Level Editing in Dynamic Neural Radiance Fields via Semantic Distillation
* 4th Multi-Modal Aerial View Image Challenge: SAR Classification - PBVS 2025
* 6D Pose Estimation of Novel Objects: A Survey
* A2-GNN: Angle-Annular GNN for Visual Descriptor-Free Camera Relocalization
* AASTFNet: An Attention-Aware Spatial-Temporal Fusion Network for Enhanced Pain Intensity Evaluation in Facial Image
* Action Anticipation from Soccernet Football Video Broadcasts
* Action Valuation in Sports: A Survey
* ActNAS: Generating Efficient YOLO Models Using Activation NAS
* Adaptive Far-Field Region of Interest Extraction and its Applications for Long-Range Ground Surveillance
* Adaptive Multi-Feature Fusion Algorithm for Ship Rust Detection on Coating Surfaces
* Adaptive Part Shifting for Fine-Grained Ship Classification in Remote Sensing Images
* Adaptor: Adaptive Token Reduction for Video Diffusion Transformers
* AdaVid: Adaptive Video-Language Pretraining
* Advancements in Affective and Behavior Analysis: The 8th ABAW Workshop and Competition
* Advancing Ambient Lighting Normalization via Diffusion Shadow Generation
* Advancing Disease Detection Using Deep Learning in Low-Data Environments
* Advancing Facial Age Progression for Occluded Faces
* Adversarially Domain-Adaptive Latent Diffusion for Unsupervised Semantic Segmentation
* Aerial Infrared Health Monitoring of Solar Photovoltaic Farms at Scale
* AerOSeg: Harnessing SAM for Open-Vocabulary Segmentation in Remote Sensing Images
* AG-MAE: Anatomically Guided Spatio-Temporal Masked Auto-Encoder for Online Hand Gesture Recognition
* Age Prediction of Komatsuna using Hu Moments with Neural Networks for Small Datasets
* AGILE: A Diffusion-Based Attention-Guided Image and Label Translation for Efficient Cross-Domain Plant Trait Identification
* Agri-FM+: a Self-Supervised Foundation Model for Agricultural Vision
* Agro-Net: a Convolution-Attention Fusion Based Hyperspectral Model for Agro-Food Quality Assessment
* AGS-Mesh: Adaptive Gaussian Splatting and Meshing with Geometric Priors for Indoor Room Reconstruction Using Smartphones
* AI Hiring with LLMs: A Context-Aware and Explainable Multi-Agent Framework for Resume Screening
* AI-Based Video Content Understanding for Automatic and Interactive Multimedia Retrieval
* alpha-Surf: Implicit Surface Reconstruction for Semi-Transparent and Thin Objects with Decoupled Geometry and Opacity
* AMF-UNet: A Lightweight Adaptive Multi-Mamba Fusion U-Shaped Network for Medical Image Segmentation
* Analyzing Hierarchical Structure in Vision Models with Sparse Autoencoders
* AnomalyHybrid: A Domain-Agnostic Generative Framework for General Anomaly Detection
* Any-scale Object Detection using Arbitrary-scaled Images
* AppleGrowthVision: A Large-Scale Stereo Dataset for Phenological Analysis, Fruit Detection, and 3D Reconstruction in Apple Orchards
* Approximate 2D-3D Shape Matching for Interactive Applications
* ARC-Flow: Articulated, Resolution-Agnostic, Correspondence-Free Matching and Interpolation of 3D Shapes Under Flow Fields
* ARC-NeRF: Area Ray Casting for Broader Unseen View Coverage in Few-Shot Object Rendering
* ARDGen: Augmentation Regularization for Domain-Generalized Medical Report Generation
* Are Vision-Language Models Ready for Dietary Assessment? Exploring the Next Frontier in AI-Powered Food Image Recognition
* Artificial Intelligence in CT-Based Diagnosis of Small Pulmonary Nodules: Current Applications and Future Perspectives
* AthletePose3D: A Benchmark Dataset for 3D Human Pose Estimation and Kinematic Validation in Athletic Movements
* Attacking Attention of Foundation Models Disrupts Downstream Tasks
* Attention-Aware Temporal Adversarial Shadows on Traffic Sign Sequences
* Attention-Guided Hierarchical Defense for Multimodal Attacks in Vision-Language Models
* AttentiveGRU: Recurrent Spatio-Temporal Modeling for Advanced Radar-Based BEV Object Detection
* Augmented Reality Applications using Active Markers with an Event Camera
* Automated Essential Concept Discovery for Few-Shot Out-of-Distribution Detection
* Automatic Rating Approach Using Machine Learning and Feature Selection for Finger Tapping in MDS-UPDRS Part III, An
* Automatic Segmentation of Metaplasia in an Endoscopic Decision Support System
* Autonomous Multimodal Reasoning via Implicit Chain-of-Vision
* AutoVFX: Physically Realistic Video Editing from Natural Language Instructions
* Balancing Privacy and Action Performance: A Penalty-Driven Approach to Image Anonymization
* Behind the Magic, MERLIM: Multi-Modal Evaluation Benchmark for Large Image-Language Models
* Benchmarking Multi-Modal Semantic Segmentation Under Sensor Failures: Missing and Noisy Modality Robustness
* Best Linear Unbiased Estimation for 2D and 3D Flow with Event-Based Cameras
* Betsu-Betsu: Multi-View Separable 3D Reconstruction of Two Interacting Objects
* Better Coherence, Better Height: Fusing Physical Models and Deep Learning for Forest Height Estimation from Interferometric SAR Data
* Beyond Academic Benchmarks: Critical Analysis and Best Practices for Visual Industrial Anomaly Detection
* Beyond Neurofibrillary Tangles: Explainable AI for Microscopic Tauopathy Classification in Immunofluorescence Imaging
* Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model
* BiasBench: A Reproducible Benchmark for Tuning the Biases of Event Cameras
* Bidirectional Action Sequence Learning for Long-term Action Anticipation with Large Language Models
* BiGS: Bidirectional Primitives for Relightable 3D Gaussian Splatting
* BIMA: Bijective Maximum Likelihood Learning Approach to Hallucination Prediction and Mitigation in Large Vision-Language Models
* Binned MSE for Imbalanced Dust Density Estimation
* Binocular Vision-Based Infrastructure Crack Measurement with Morphological Union Enhancement
* Boosting Small Object Tracking via Collaborative Detection Transformer
* BRAT: Bidirectional Relative Positional Attention Transformer for Event-Based Eye Tracking
* Bridging Classical and Modern Computer Vision: PerceptiveNet for Tree Crown Semantic Segmentation
* Bridging Detection and Re-Identification: Evaluating Trustworthiness and Error Propagation in Face Recognition Pipelines
* Bridging Morphology and Molecular Signatures: Multi-Task Deep Learning for Multi-Omics Prediction from Histopathology
* Bridging Self-Supervision and Mechanism of Action Discovery in Morphological Profiling
* Bridging the Modality Gap: Training-Free Adaptation of Vision-Language Models for Remote Sensing via Visual Prototypes
* CACP: Context-Aware Copy-Paste to Enrich Image Content for Data Augmentation
* CaddieSet: A Golf Swing Dataset with Human Joint Features and Ball Information
* CadenceRAG: Context-Aware and Dependency-Enhanced Retrieval Augmented Generation for Holistic Video Understanding
* Calibration-Free Method for Large-View Classroom People Counting with Object Detection-Based Structure Matching, A
* California Crop Yield Benchmark: Combining Satellite Image, Climate, Evapotranspiration, and Soil Data Layers for County-Level Yield Forecasting of Over 70 Crops
* CamCtrl3D: Single-Image Scene Exploration with Precise 3D Camera Control
* Camera-Only 3D Panoptic Scene Completion for Autonomous Driving Through Differentiable Object Shapes
* CameraHMR: Aligning People with Perspective
* Can Geometry Save Central Views for Sports Field Registration?
* Can Relevance Feedback, Conversational Search and Foundation Models Work Together for Interactive Video Search and Exploration?
* Can Vision-Language Models Understand and Interpret Dynamic Gestures from Pedestrians? Pilot Datasets and Exploration Towards Instructive Nonverbal Commands for Cooperative Autonomous Vehicles
* Capturing Fine-Grained Alignments Improves 3D Affordance Detection
* CARN: Complexity-Aware Routing Network for Efficient and Adaptive Inference
* CatFree3D: Category-Agnostic 3D Object Detection with Diffusion
* CDVS: Compressed Domain On Device Memory Efficient 8K Video SlowMo
* CE-NPBG: Connectivity Enhanced Neural Point-Based Graphics for Novel View Synthesis in Autonomous Driving Scenes
* CellRep: Multichannel Image Representation Learning Model
* CETrack: A Feature-Match-Based Framework for Lesion Tracking in CE Videos
* CFPNet: Improving Lightweight ToF Depth Completion via Cross-Zone Feature Propagation
* Choosing 'Right' from Wrong: A Closer Look at Selection Bias in Spatial Multiple-Choice Questions in Large Multimodal Models
* Citygen: Infinite and Controllable City Layout Generation
* Classification Drives Geographic Bias in Street Scene Segmentation
* CleanMAP: Distilling Multimodal LLMs for Confidence-Driven Crowdsourced HD Map Updates
* CLIP-Guided Cross-Modal Feature Fusion based Few-Shot Learning for Nighttime Pavement Defect Detection
* CLIP-SLA: Parameter-Efficient CLIP Adaptation for Continuous Sign Language Recognition
* Clip4Retrofit: Enabling Real-Time Image Labeling on Edge Devices via Cross-Architecture CLIP Distillation
* CLIPDraw++: Text-to-Sketch Synthesis with Simple Primitives
* CoDEx: Combining Domain Expertise for Spatial Generalization in Satellite Image Analysis
* CoE: Deep Coupled Embedding for Non-Rigid Point Cloud Correspondences
* Combining Vision-Language Models and Weak Supervision for Nuanced Vision Classification Tasks
* Comparative Analysis of Object Detection Algorithms for Bolt Detection: Performance Evaluation of Faster R-CNN, SSD, RetinaNet and YOLOv8n
* Comparison Visual Instruction Tuning
* Compositional Image-Text Matching and Retrieval by Grounding Entities
* Compressed Domain Multiframe Processing
* Compression and Rendering of Time-varying Interplanetary Volumes
* Condimen: Conditional Multi-Person Mesh Recovery
* Confidence-based Adaptive Weighted Boxes Fusion for Multi-Object Tracking of Small Birds
* Confidence-Calibrated Covariate Shift Correction for Few-Shot Classification in Vision-Language Models
* conSAMmé: Achieving Consistent Segmentations with SAM
* Construction of Three-Dimensional Memristor-Enhanced Polynomial Hyperchaotic Map and its Application in Image Security Protection
* Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting
* COP-GEN-Beta: Unified Generative Modelling of COPernicus Imagery Thumbnails
* Cross-Modal Consistency Learning for Sign Language Recognition
* Cross-Modal Facial Expression Recognition with Global Channel-Spatial Attention: Modal Enhancement and Proportional Criterion Fusion
* Cross-Modal Knowledge Distillation from First-Person Views to Third-Person BEV Maps for Universal Point Goal Navigation
* Cross-Spectral Body Recognition with Side Information Embedding: Benchmarks on LLCM and Analyzing Range-Induced Occlusions on IJB-MDF
* CSRN: Cross-Sensor Robust Recognition Network for Multi-Modal Aerial View Object Classification
* CTC: Contribution to Classification of Complex Features
* Ctrl-Room: Controllable Text-to-3D Room Meshes Generation with Layout Constraints
* Cycle Training with Semi-Supervised Domain Adaptation: Bridging Accuracy and Efficiency for Real-Time Mobile Scene Detection
* CYFLOD: Cyclic Filtering and Loss Damping for Alleviating Noisy Labels in Fine-Grained Visual Classification
* CytoFM: The First Cytology Foundation Model
* D-Feat Occlusions: Diffusion Features for Robustness to Partial Visual Occlusions in Object Recognition
* DAF: Distillation, Augmentation and Filtering Based Framework for Efficient Smartphone Human Activity Recognition
* Data Scaling Laws for End-to-End Autonomous Driving
* Data-driven Head Motion Generation through Natural Gaze-Head Coordination
* Dataformer: Differential Additive Transformer for Lightweight Semantic Segmentation
* Dataset for Semantic and Instance Segmentation of Modern Fruit Orchards, A
* Datasets for Valence and Arousal Inference: A Survey
* DCSEG: Decoupled 3D Open-Set Segmentation using Gaussian Splatting
* Deciding the Path: Leveraging Multi-Agent Systems for Solving Complex Tasks
* DeclutterNeRF: Generative-Free 3D Scene Recovery for Occlusion Removal
* Decoding Vision Transformers: The Diffusion Steering Lens
* Decomposing Food Images for Better Nutrition Analysis: a Nutritionist-Inspired Two-Step Multimodal Llm Approach
* Decoupled Scale and Appearance for Optimal Deep Diamond ReID
* Decoupling Identity Confounders for Enhanced Facial Expression Recognition: An Information-Theoretic Approach
* Deep Diffusion Models and Unsupervised Hyperspectral Unmixing for Realistic Abundance Map Synthesis
* Deep Learning Model-Based Nudity Detection with Image Feature Extraction Approaches for GLAM Materials
* Deep Learning-Based Classification of Planar^99m-Tc Pyrophosphate Scintigraphy for the Diagnosis of Cardiac Amyloidosis
* Deep Polycuboid Fitting for Compact 3D Representation of Indoor Scenes
* Defending Against Frequency-Based Attacks with Diffusion Models
* Defending Against Transfer-Based Adversarial Attacks Using SVD-Driven Feature Evolution
* Define, Refine, Align: Correspondence-Free 3D Line Alignment with Attentional, Equivariant and Rotational Layers
* DeforHMR: Vision Transformer with Deformable Cross-Attention for 3D Human Mesh Recovery
* DEFT-VTON: Efficient Virtual Try-On with Consistent Generalised H-Transform
* Defurnishing with X-Ray Vision: Joint Removal of Furniture from Panoramas and Mesh
* DEGAS: Detailed Expressions on Full-Body Gaussian Avatars
* DELTA: Dense Depth from Events and LiDAR Using Transformer's Attention
* DEMO: Point-Feature Tracking for Pixel Processor Arrays
* Denoising Monte Carlo Renders with Diffusion Models
* Dental Lesion Segmentation Method Based on Hypernetwork Improved Unet
* Detailed 3D Modeling and Component Monomerization Extraction of Buildings Using Close-Range Photogrammetry
* Detect, Classify, Act: Categorizing Industrial Anomalies with Multi-Modal Large Language Models
* Detecting Hand-Object Interaction Based on Movements in Hand Surrounding Region
* Detecting Localized Deepfake Manipulations Using Action Unit-Guided Video Representations
* Detecting Looted Archaeological Sites from Satellite Image Time Series
* Detection and Localization of Drones and UAVs Using Sound and Vision
* Detection of Medial Epicondyle Avulsion in Elbow Ultrasound Images via Bone Structure Reconstruction
* Detector-Free Image Matching with Lightweight Backbone and Feature Filtering
* Diabetes Screening Algorithm Embedded with Inception Deep Convolution in Swin Transformer, A
* Diagnosis of Pediatric Hypopigmentary Dermatoses Based on Lightweight HierAttn Network
* Diffusion-Based Continuous Sign Language Generation with Cluster-Specific Fine-Tuning and Motion-Adapted Transformer
* Direct and Explicit 3D Generation from a Single Image
* Direct Zero-Shot Indoor Scene Recognition Method Based on Visual Question Answering, A
* Direction-Aware Hybrid Representation Learning for 3D Hand Pose and Shape Estimation
* Disentangling Polysemantic Channels in Convolutional Neural Networks
* Disentangling Visual Transformers: Patch-Level Interpretability for Image Classification
* Dist-Tracker: A Small Object-Aware Detector and Tracker for UAV Tracking
* Distillation-Supervised Convolutional Low-Rank Adaptation for Efficient Image Super-Resolution
* Distilling Normalizing Flows
* Distribution Shifts at Scale: Out-of-distribution Detection in Earth Observation
* DLSF: Dual-Layer Synergistic Fusion for High-Fidelity Image Synthesis
* DLST: Dual-Template Co-Evolution Learning for Robust Long-Term Drone Tracking in Dynamic Environments
* Document Image Rectification using Stable Diffusion Transformer
* Domain Adaptation for Skin Lesion: Evaluating Real-World Generalisation
* Domain Adaptation of VLM for Soccer Video Understanding
* Domain Generalization for Semantic Segmentation: A Survey
* Domain Generalization of Pathological Image Segmentation by Patch-Level and WSI-Level Contrastive Learning
* Domain Generalization Through Attenuation of Domain-Specific Information
* Dream-in-Style: Text-to-3D Generation Using Stylized Score Distillation
* DreamBeast: Distilling 3D Fantastical Animals with Part-Aware Knowledge Transfer
* DressRecon: Freeform 4D Human Reconstruction from Monocular Video
* Drivable 3D Gaussian Avatars
* Drive4C: A Closed-Loop Benchmark on what Foundation Models Really Need to be Capable of for Language-Guided Autonomous Driving
* Drug Discovery Agent: An Automated Vision Detection System for Drug-Cell Interactions
* DSCViTANet: A Hybrid Depthwise Separable Convolution and Vision Transformer for Early Alzheimer's Classification
* Dual Precision Quantization for Efficient and Accurate Deep Neural Networks Inference
* Dual-Input Frequency-Aware Network for High-Quality Thermal Image Super-Resolution
* Dual-Path Enhancements in Event-Based Eye Tracking: Augmented Robustness and Adaptive Temporal Modeling
* Dual-Stage Cross-Modal Network with Dynamic Feature Fusion for Emotional Mimicry Intensity Estimation
* DuoSpaceNet: Leveraging Both Bird's-Eye-View and Perspective View Representations for 3D Object Detection
* Dust to Detail: Restoring Sand-dust Images with Frequency-Guided Attention and Multi-Scale Features
* Dyadic Mamba: Long-term Dyadic Human Motion Synthesis
* Dynamic Age Estimation via Mixture of Experts: Bridging Semantic and Structural Models
* Dynamic EventNeRF: Reconstructing General Dynamic Scenes from Multi-View RGB and Event Streams
* Dynamic State-Control Modeling for Generalized Remote Sensing Image Super-Resolution
* Dynamic Watermarks in Images Generated by Diffusion Models
* DynOMo: Online Point Tracking by Dynamic Online Monocular Gaussian Reconstruction
* DySS: Dynamic Queries and State-Space Learning for Efficient 3D Object Detection from Multi-Camera Videos
* E-3DGS: Event-Based Novel View Rendering of Large-Scale Scenes Using 3D Gaussian Splatting
* E-BARF: Bundle Adjusting Neural Radiance Fields from a Moving Event Camera
* E-VLC: A Real-World Dataset for Event-Based Visible Light Communication and Localization
* ECO-AI: Energy-Conscious Optimization for AI Training
* EcoWikiRS: Learning Ecological Representation of Satellite Images from Weak Supervision with Species Observations and Wikipedia
* Edge-Augmented HLAC and Gaussian Distribution-Based Weighted Feature Extraction for 1-ms Abnormal Detection System in Logistics
* Effectiveness of Max-Pooling for Fine-Tuning CLIP on Videos
* Effectiveness of Training with Procedurally Generated Synthetic Images of Crop Plants
* Efficient 2D to Full 3D Human Pose Uplifting Including Joint Rotations
* Efficient and Scalable Framework for Lightweight Crop Disease Recognition in Low-Resource Settings, An
* Efficient Burst Super-Resolution with One-Step Diffusion
* Efficient Continuous Group Convolutions for Local SE(3) Equivariance in 3D Point Clouds
* Efficient Image Generation with Variadic Attention Heads
* Efficient Method for Measuring Oil Casing Thread Geometric Parameters Using Point Cloud Data, An
* Efficient Self-Supervised Learning for Earth Observation via Dynamic Dataset Curation
* Efficient Skeleton-Based Action Recognition using Superposed Shape Subspace
* Efficient Task-Specific Conditional Diffusion Policies: Shortcut Model Acceleration and SO(3) Optimization
* Efficient VideoMAE via Temporal Progressive Training
* Efficiently Mitigating Video Content Misalignment on Large Vision Model with Time-Series Data Alignment
* EffiHeritageNet: Efficient Semantic Segmentation Method for Intangible Cultural Heritage Scenes
* Egocentric Event-Based Vision for Ping Pong Ball Trajectory Prediction
* EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting
* EigenLoRAx: Recycling Adapters to Find Principal Subspaces for Resource-Efficient Adaptation and Inference
* EL-Attack: Explicit and Latent Space Hybrid Optimization based General and Effective Attack for Autonomous Driving Trajectory Prediction
* Embedding Shift Dissection on CLIP: Effects of Augmentations on VLM's Representation Learning
* Emotions in LatAm: A New Dataset and Benchmark for Emotion Recognition in Latin America
* EmoVLM-KD: Fusing Distilled Expertise with Vision-Language Models for Visual Emotion Analysis
* Empirical Study for Efficient Video Quality Assessment, An
* End-To-End Pipeline for Virtual Banner Replacement in Football Broadcasts, An
* Enforcing View-Consistency in Class-Agnostic 3D Segmentation Fields
* Enhance Then Search: an Augmentation-Search Strategy with Foundation Models for Cross-Domain Few-Shot Object Detection
* Enhanced Multi-View Pedestrian Detection Using Probabilistic Occupancy Volume
* Enhanced Semantic Extraction and Guidance for UGC Image Super Resolution
* Enhancing Facial Expression Recognition with LSTM Through Dual-Direction Attention Mixed Feature Networks and Clip
* Enhancing Few-Shot Class-Incremental Learning via Frozen Feature Augmentation
* Enhancing Multi-Modal Automatic Target Recognition Using Out-of-Distribution Exploitation (MATRODE)
* Enhancing Multimodal Sarcasm Detection Via Global and Local Prompt Mechanisms
* Enhancing Reliability of Medical Image Diagnosis through Top-rank Learning with Rejection Module
* Enhancing Vision Transformer Explainability using Artificial Astrocytes
* ePBR: Extended PBR Materials in Image Synthesis
* EV-Flying: An Event-Based Dataset for In-the-Wild Recognition of Flying Objects
* EV-LayerSegNet: Self-Supervised Motion Segmentation Using Event Cameras
* Evaluating Text-to-Video Alignment: A Hierarchical Benchmark for Video Generation Models
* EvenFormer: Dynamic Even Transformer for Real-World Image Restoration
* Event Quality Score (EQS): Assessing the Realism of Simulated Event Camera Streams via Distances in Latent Space
* Event-Based Continuous Color Video Decompression from Single Frames
* Event-Based Eye Tracking. 2025 Event-Based Vision Workshop
* Event-Based Tracking and Imaging of Randomly Moving Objects in Dense Dynamical Scattering Media
* Event-Conditioned Dual-Modal Fusion for Motion Deblurring
* Event-Driven Dynamic Attention for Multi-Object Tracking on Neuromorphic Hardware
* ExaM: Unsupervised Concept-Based Representation Learning to Better Explain Models in Vision Tasks
* Exemplar Masking for Multimodal Incremental Learning
* Expanded SPAN for Efficient Super-Resolution
* Explainable Physical PolSAR Autoencoders for Soil Moisture Estimation
* Explaining 3D Point Cloud Semantic Segmentation Models Through Adversarial Attacks
* Exploiting Adversarial Learning and Topology Augmentation for Open-Set Visual Recognition
* Exploiting Frequency Correlation for Hyperspectral Image Reconstruction
* Exploration of the Mechanisms Underlying Corneal Decompensation Using Graph Neural Networks
* Exploring Emotional Engagement with Responsible AI Constructs: A Video-Based Cognitive Experiment
* Exploring Missing Modality in Multimodal Egocentric Datasets
* Exploring Modality Guidance to Enhance VFM-Based Feature Fusion for UDA in 3D Semantic Segmentation
* Exploring Semi-Supervised Learning for Online Mapping
* Exploring Temporal Dynamics in Event-Based Eye Tracker
* Extra-Lightweight AI-Based Privacy Preserving Framework for Egocentric Wearable Cameras
* Eyes Tell the Truth: Gaze Val Highlights Shortcomings of Generative AI in Medical Imaging
* Face Reconstruction from Face Embeddings Using Adapter to a Face Foundation Model
* FaceGest: A Comprehensive Facial Gesture Dataset for Human-Computer Interaction
* Fairness-Aware Boosting Model for Imbalanced 3D Point Cloud Segmentation in Autonomous Driving
* FALCON: Fast Image Haze Removal Leveraging Continuous Density Mask
* Fast Sphericity and Roundness Approximation in 2D and 3D Using Local Thickness
* FastGrasp: Efficient Grasp Synthesis with Diffusion
* FCTFANet: A Fused CNN-Transformer Feature Aggregator Network for Image Restoration
* Feature Attenuation of Defective Representation Can Resolve Incomplete Masking on Anomaly Detection
* Feature Matching in the Dark: Homography-Based RGB-IR Feature Transformation for Low-Light Vision
* FedAlign: Federated Domain Generalization With Cross-Client Feature Alignment
* FedCAPR:Federated Camera-Aware Unsupervised Person Re-Identification with Identity-Distributed Equalization for Decentralized Data Clustering
* FedCIAL: Federated Color-Invariant Adversarial Learning for Enhancing Fairness and Performance in Skin Lesion Classification
* FedDG-MoE: Test-Time Mixture-of-Experts Fusion for Federated Domain Generalization
* FedSECA: Sign Election and Coordinate-Wise Aggregation of Gradients for Byzantine Tolerant Federated Learning
* Few-Shot Adaptation of Grounding DINO for Agricultural Domain
* FieldMOT: A Field-Registered Multi-Object Tracking for Sports Videos
* Fine-Grained Artist Identification Method for Authentication and Attribution of Drawings Using Hatching Lines, A
* Fine-Grained Few-Shot Classification with Part Matching
* FineCausal: A Causal-Based Framework for Interpretable Fine-Grained Action Quality Assessment
* Flar-SVD: Fast and Latency-Aware Singular Value Decomposition for Model Compression
* Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image
* Flow-Guided Deformable Alignment with Channel-Wise Self-Attention Reconstruct for Efficient Burst HDR Restoration
* FlowLoss: Dynamic Flow-Conditioned Loss Strategy for Video Diffusion Models
* FlowMap: High-Quality Camera Poses, Intrinsics, and Depth via Gradient Descent
* FM-LoRA: Factorized Low-Rank Meta-Prompting for Continual Learning
* FMDP: Leveraging a Foundation Model for Dual-Pixel Disparity Estimation
* FOCUS: Multi-View Foot Reconstruction from Synthetically Trained Dense Correspondences
* Food Degradation Analysis Using Multimodal Fuzzy Clustering
* FoodVideoQA: A Novel Baseline Framework for Dietary Monitoring
* FORCE: Physics-Aware Human-Object Interaction
* ForesightNav: Learning Scene Imagination for Efficient Exploration
* Forest Fire and Smoke Recognition Based on YOLO
* Forget Less, Learn More: Contrastive-Based Federated Class Incremental Learning with a Low-Dimensional Projection Layer
* Foundation Models for Remote Sensing: An Analysis of MLLMs for Object Localization
* FourieRF: Few-Shot NeRFs via Progressive Fourier Frequency Control
* Fourth Monocular Depth Estimation Challenge, The
* FPD: Fringe Photometric Deflectometry when Fringe Meets Photometric Stereo
* FQ-EMCI-Net: A Multi-Head Attention CNN-DQN Approach with Filtered Q-Learning and Equilibrium Monte Carlo Initialization for SP-DLBP
* FreBIS: Frequency-Based Stratification for Neural Implicit Surface Representations
* Frequency-Prior Enhanced Ambient Lighting Normalization Via Visual Perceptual Refinement
* FrogDogNet: Fourier Frequency Retained Visual Prompt Output Guidance for Domain Generalization of CLIP in Remote Sensing
* From Beats to Scores: A Multi-Modal Framework for Comprehensive Figure Skating Assessment
* From Broadcast to Minimap: Achieving State-of-the-Art Soccernet Game State Reconstruction
* From Data to Design: Leveraging Frequency Statistics for Efficient Neural Network Architectures
* From Precomputed Particle Shading to Volumetric Atmospheric Cloud Rendering for Real-Time Gaming: Methods and Advances
* FullCycle: Full Stage Adversarial Attack for Reinforcement Learning Robustness Evaluation
* Fully-Geometric Cross-Attention for Point Cloud Registration
* FungiTastic: A Multi-Modal Dataset and Benchmark for Image Categorization
* FusedVision: A Knowledge-Infusing Approach for Practical Anomaly Detection in Real-World Surveillance Videos
* Fusion or Confusion? A Look at Dataset Pooling for Infrared Object Detection
* FUSION: Frequency-Guided Underwater Spatial Image recOnstructioN
* FusionNet: Multi-Model Linear Fusion Framework for Low-Light Image Enhancement
* G-Buffer Supported Neural Screen-Space Refraction Baking for Real-Time Global Illumination
* Garment3DGen: 3D Garment Stylization and Texture Generation
* GarmentDreamer: 3DGS Guided Garment Synthesis with Diverse Geometry and Texture Details
* Gaussian Garments: Reconstructing Simulation-Ready Clothing with Photorealistic Appearance from Multi-View Video
* Gaussianavatar-Editor: Photorealistic Animatable Gaussian Head Avatar Editor
* Gaussians-to-Life: Text-Driven Animation of 3D Gaussian Splatting Scenes
* GaussianStyle: Gaussian Head Avatar via StyleGAN
* GaussianVideo: Efficient Video Representation and Compression by Gaussian Splatting
* Gaze Attention Estimation for Medical Environments
* Gen3DSR: Generalizable 3D Scene Reconstruction Via Divide and Conquer From a Single View
* Generalizable Unsupervised Microscopy Video Denoising via Weighted SpatioTemporal Sampling
* Generative AI for Film Creation: A Survey of Recent Advances
* Generative AI Game Jam Case Study From October 2024, A
* Geometric Consistency Refinement for Single Image Novel View Synthesis via Test-Time Adaptation of Diffusion Models
* Geometric Correspondence Consistency in RGB-D Relative Pose Estimation
* Geometrically Constrained Position Estimation through Low-level Tracking
* Geometry-Aware Feature Matching for Large-Scale Structure from Motion
* Geometry-Aware Texture Generation for 3D Head Modeling with Artist-Driven Control
* Geometry-Guided Cross-View Diffusion for One-to-Many Cross-View Image Synthesis
* Get a GRIP on Test Time Adaptation! - Group Robust Inference-Time Policy Optimization for Vision Models
* GLNet-YOLO: Research on Pedestrian Detection Technology Based on Multimodal Feature Fusion
* gMINT: Gradiant-based Membership Inference Test Applied to Image Models
* Goal-Driven Human Motion Synthesis in Diverse Tasks
* Good4cir: Generating Detailed Synthetic Captions for Composed Image Retrieval
* GPT-FL: Generative Pre-Trained Model-Assisted Federated Learning
* Gradient-Guided Optimization for Large Motion Video Frame Interpolation
* GRIN: Zero-Shot Metric Depth with Pixel-Level Diffusion
* GRS: Generating Robotic Simulation Tasks from Real-World Images
* GS-Pose: Generalizable Segmentation-Based 6D Object Pose Estimation with 3D Gaussian Splatting
* GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers
* GVP: Generative Volumetric Primitives
* HAECcity: Open-Vocabulary Scene Understanding of City-Scale Point Clouds with Superpoint Graph Clustering
* Harmonizing Attention Fields with Knowledge Distillation for Multi-View 3D Object Detection
* HARMONY: Hidden Activation Representations and Model Output-Aware Uncertainty Estimation for Vision-Language Models
* HCS-DFC: A Diffusion Classifier for Mode of Action Prediction Using Morphological Profiles
* HDC: Hierarchical Distillation for Multi-Level Noisy Consistency in Semi-Supervised Fetal Ultrasound Segmentation
* HeadCraft: Modeling High-Detail Shape Variations for Animated 3DMMs
* HeadEvolver: Text to Head Avatars via Expressive and Attribute-Preserving Mesh Deformation
* HeadGAP: Few-Shot 3D Head Avatar via Generalizable Gaussian Priors
* HeAL3D: Heuristical-enhanced Active Learning for 3D Object Detection
* Hierarchical Multi-Task Restoration Network for Old Photo Enhancement
* Hierarchical Semantic Segmentation with Autoregressive Language Modeling
* High-Precision Human Pose Estimation Algorithm Based on Multi-View LiDAR and Visible Light Sensors
* Highway Signage Breakage Detection Algorithm Based on Improved YOLOv8
* HILoF-DETR: A Lightweight Framework for SAR Ship Detection with Spatial Frequency Enhancement and Dynamic Alignment
* HMD2: Environment-Aware Motion Generation from Single Egocentric Head-Mounted Device
* HOI-Diff: Text-Driven Synthesis of 3D Human-Object Interactions using Diffusion Models
* HoleGest: Decoupled Diffusion and Motion Priors for Generating Holisticly Expressive Co-Speech Gestures
* HopNet: Harmonizing Object Placement Network for Realistic Image Generation via Object Composition
* How Does the Machine Perceive Depth for Indoor Single Images with CNN?
* How Good is my Video-LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs
* How Much Noise is There in Labels Generated by Humans? A Method to Validate Automatically Generated Bounding Boxes
* Human Eye Optics Simulation and Visual Modeling of Myopia Correction
* Human Mesh Reconstruction of Sports Players with Multiple Dynamic Cameras
* Human vs. Machine Minds: Ego-Centric Action Recognition Compared
* Human-Robot Navigation Using Event-Based Cameras and Reinforcement Learning
* HumMorph: Generalized Dynamic Human Neural Fields From Few Views
* Hybrid AI-Physical Modeling for Penetration Bias Correction in X-Band InSAR DEMs: A Greenland Case Study
* IAUNet: Instance-Aware U-Net
* IBD: Alleviating Hallucinations in Large Vision-Language Models via Image-Biased Decoding
* Ice Hockey Puck Localization Using Contextual Cues
* ICT-QA: Question Answering Over Multi-Modal Contexts Including Image, Chart, and Text Modalities
* Idoldancenet: Indian Heritage Idol Dance Pose Classification
* iFusion: Inverting Diffusion for Pose-Free Reconstruction From Sparse Views
* IG-ODAM: Instance-Aware Visual Explanations for Object Detection with Integrated Gradients
* IGL-DT: Iterative Global-Local Feature Learning with Dual-Teacher Semantic Segmentation Framework Under Limited Annotation Scheme
* IL-NeRF: Incremental Learning for Neural Radiance Fields with Camera Pose Alignment
* Illusory VQA: Benchmarking and Enhancing Multimodal Models on Visual Illusions
* IMC: A Benchmark for Invariant Learning Under Multiple Causes
* Impact of Optical System Size on Robustness in Laser Speckle Authentication
* Implicit Diffusion-Based Super-Resolution for Intangible Cultural Heritage Images
* Improved 3DUNet+ with Inter-Slice Difference Awareness for Pulmonary Vessel CT Image Segmentation, An
* Improved Out-of-Distribution Detection with Additive Angular Margin Loss
* Improved Repeat and Concatenate: A More Effective 2D X-Ray to 3D CT Image Translation Model
* Improved YOLOv8n Algorithm for Small Object Detection in Road Scenes, An
* Improving Multimodal Hateful Meme Detection Exploiting LMM-Generated Knowledge
* Improving Open-World Object Localization by Discovering Background
* Improving Optical Flow and Stereo Depth Estimation by Leveraging Uncertainty-Based Learning Difficulties
* Improving Weather-Based OOD Generalisation in Lidar-Based Object Detection Models via Adversarial Training
* iNatAg: Multi-Class Classification Models Enabled by a Large-Scale Benchmark Dataset with 4.7M Images of 2,959 Crop and Weed Species
* Incorporating Dense Metric Depth into Neural 3D Representations for View Synthesis and Relighting
* Inferring Driving Maps by Deep Learning-Based Trail Map Extraction
* INPC: Implicit Neural Point Clouds for Radiance Field Rendering
* INRet: A General Framework for Accurate Retrieval of INRs for Shapes
* Instance Feature Caching for Cross-Domain Few-Shot Object Detection
* Instruction-Augmented Multimodal Alignment for Image-Text and Element Matching
* Integrating Knowledge for High-Fidelity Remote Sensing Detection of Cross-River Bridges
* Interactive Agent Foundation Model, An
* Interactive Humanoid: Online Full Body Human Motion Reaction Synthesis with Social Affordance Forecasting and Canonicalization
* Interactive Multimodal Framework with Temporal Modeling for Emotion Recognition
* Intersection-based Ensemble for Small Multi-Object Tracking in Challenging Environments
* InterTrack: Tracking Human Object Interaction Without Object Templates
* Intriguing Properties of Robust Classification
* Investigating Mechanisms for In-Context Vision Language Binding
* Invisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting
* IRR-RADA: A Reflection-Aware Saliency Map and Adaptive Curriculum Learning Based Data Augmentation Method for Image Reflection Removal
* Is Multi-Person Gait Recognition Feasible Under Mutual Occlusion? A Human Model Regression-Based Approach
* Is Temporal Prompting All We Need for Limited Labeled Action Recognition?
* ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements
* Iterative Event-Based Motion Segmentation by Variational Contrast Maximization
* Iterative Similarity Perturbation Point Cloud Registration Based on Deformation-Resistant Region Detection
* JADE: Joint-Aware Latent Diffusion for 3D Human Generative Modeling
* Joker: Conditional 3D Head Synthesis with Extreme Facial Expressions
* Jump-Aware: Player Position Rectification and Identification in Dynamic Sports Using Jump Event Spotting
* KernFusNet: Implicit Kernel Modulation and Fusion for Blind Super-Resolution
* Knowledge Distillation Approach for SOS Fusion Staging: Towards Fully Automated Skeletal Maturity Assessment
* Knowledge Distillation Based Binarized Separable Convolutional Neural Network for Underwater Acoustic Target Recognition, A
* Knowledge Distillation from First-Order Representation for Visual State Space Model
* KOFFVQA: An Objectively Evaluated Free-Form VQA Benchmark for Large Vision-Language Models in the Korean Language
* LADI v2: Multi-Label Dataset and Classifiers for Low-Altitude Disaster Imagery
* LangCoop: Collaborative Driving with Language
* LangGas: Introducing Language in Selective Zero-Shot Background Subtraction for Semi-Transparent Gas Leak Detection with a New Dataset
* LangOcc: Open Vocabulary Occupancy Estimation via Volume Rendering
* Language-Guided Trajectory Traversal in Disentangled Stable Diffusion Latent Space for Factorized Medical Image Generation
* LAPIS: A Novel Dataset for Personalized Image Aesthetic Assessment
* LapisGS: Layered Progressive 3D Gaussian Splatting for Adaptive Streaming
* Large-Scale Analysis on Contextual Self-Supervised Video Representation Learning, A
* Large-Scale Dataset of Gaussian Splats and Their Self-Supervised Pretraining, A
* Latent Patched Efficient Diffusion Model for High Resolution Image Synthesis
* Learned Lightweight Smartphone ISP with Unpaired Data
* Learned Smartphone ISP on Mobile GPUs, Mobile AI 2025 Challenge: Report
* Learning Assisted Interactive Modelling with Rough Freehand 3D Sketch Strokes
* Learning from Noise: Enhancing DNNs for Event-Based Vision Through Controlled Noise Injection
* Learning Naturally Aggregated Appearance for Efficient 3D Editing
* Learning Optical Flow Field via Neural Ordinary Differential Equation
* Learning Pose-Aware Representations in Vision Transformers for Understanding Activities of Daily Living
* Learning to Drive from a World Model
* Less Biased Noise Scale Estimation for Threshold-Robust RANSAC
* Leveraging Anthropometric Measurements to Improve Human Mesh Estimation and Ensure Consistent Body Shapes
* Leveraging Fixed and Dynamic Pseudo-Labels in Cross-Supervision Framework for Semi-Supervised Medical Image Segmentation
* Leveraging Intermediate Features of Vision Transformer for Face Anti-Spoofing
* Leveraging Lightweight Facial Models and Textual Modality in Audio-Visual Emotional Understanding in-the-Wild
* Leveraging Masked Feature and Consistency Regularization for Unsupervised Domain Adaptation Based Semi-Supervised Semantic Segmentation
* Leveraging Multimodal Large Language Models for Joint Discrete and Continuous Evaluation in Text-to-Image Alignment
* Leveraging Multimodal Large Language Models for Referring Camouflaged Object Detection
* Leveraging Synthetic Adult Datasets for Unsupervised Infant Pose Estimation
* Leveraging Vision-Language Foundation Models to Reveal Hidden Image-Attribute Relationships in Medical Imaging
* LFMix: A Lightweight Hybrid Architecture for Light Field Super-Resolution
* LFTramba: Comprehensive Information Learning for Light Field Image Super-Resolution via a Hybrid Transformer-Mamba Framework
* LFTransMamba: A Hybrid Mamba-Transformer Model for Light Field Image Super-Resolution
* Lightplane: Highly-Scalable Components for Neural 3D Fields
* Lightweight Anonymous Authenticated Key Agreement Protocol for V2I with Multi-TA Model, A
* Lightweight Convolutional Neural Network for Underwater Image Quality Enhancement, A
* Lightweight Moment Retrieval System with Global Re-Ranking and Robust Adaptive Bidirectional Temporal Search, A
* Lightweight Perception-Driven Compression Method for Social Media Images, A
* Live Demonstration: Neurotouch - A Neuromorphic Vision-Based Tactile Sensor for Real-Time Gesture Recognition
* Live Demonstration: Real-Time Event-Data Processing with Graph Convolutional Neural Networks and SoC FPGA
* LLaVA-SCo: Teach Vision Language Models to Self-Correct
* LLM Framework for Long-Form Video Retrieval and Audio-Visual Question Answering Using Qwen2/2.5, An
* LLM-Enabled Multi-Agent Autonomous Mechatronics Design Framework, An
* LLMPi: Optimizing LLMs for High-Throughput on Raspberry Pi
* LMFormer: Lane Based Motion Prediction Transformer
* LNTransformer: Lung Nodule Transformer for Sparse CT Segmentation
* Location-Free Scene Graph Generation
* Looking into the Shadow: Recording a Total Solar Eclipse with High-Resolution Event Cameras
* LoopSplat: Loop Closure by Registering 3D Gaussian Splats
* Low-Frame-Rate Cell Tracking: Unmet Needs and Future Directions
* Low-Latency Real-Time Audio-Driven Talking Head Generation Based on Future Speech Feature Prediction
* Low-Light Image Enhancement Algorithm Based on Information Fusion Strategy
* Low-Resource Video Super-Resolution using Memory, Wavelets, and Deformable Convolutions
* LSE-NeRF: Learning Sensor Modeling Errors for Deblured Neural Radiance Fields with RGB-Event Stereo
* LSSInst: Improving Geometric Modeling in LSS-Based BEV Perception with Instance Representation
* LViT-GMMs: Semantic Segmentation for Maritime Object Detection
* LVP-CLIP: Revisiting CLIP for Continual Learning with Label Vector Pool
* M-Adaptor: Text-Driven Whole-Body Human Motion Generation
* MAC++: Going Further with Maximal Cliques for 3D Registration
* Machine Unlearning in Hyperbolic vs. Euclidean Multimodal Contrastive Learning: Adapting Alignment Calibration to MERU
* MAD: Makeup All-in-One with Cross-Domain Diffusion Model
* Magic for the Age of Quantized DNNs
* Magnetic Tile Defect Detection with Cross-Scale Visual Feature Fusion: A Cascade Framework of Improved YOLOv11 and SAM Segmentation
* Maize Ear Sensing for on-Farm Yield Predictions
* Making Every Event Count: Balancing Data Efficiency and Accuracy in Event Camera Subsampling
* Mamba-VA: A Mamba-Based Approach for Continuous Emotion Recognition in Valence-Arousal Space
* Mapping Biodiversity at Very-High Resolution in Europe
* Maps from Motion (MfM): Generating 2D Semantic Maps from Sparse Multi-View Images
* MaskAdapt: Unsupervised Geometry-Aware Domain Adaptation Using Multimodal Contextual Learning and RGB-Depth Masking
* Masked Face Recognition Method with Arcface Fusion of Attention and Focal Loss
* MASt3R-SfM: A Fully-Integrated Solution for Unconstrained Structure-from-Motion
* MaterialFusion: Enhancing Inverse Rendering with Material Diffusion Priors
* MAVEN: Multi-Modal Attention for Valence-Arousal Emotion Network
* Maximizing Aerial Detection of Organic Objects in Non-Exhaustively Searchable Survey Areas
* MDMP: Multi-Modal Diffusion for Supervised Motion Predictions with Uncertainty
* MegaLoc: One Retrieval to Place Them All
* MerCulture: A Comprehensive Benchmark to Evaluate Vision-Language Models on Cultural Understanding in Singapore
* MESA: Text-Driven Terrain Generation Using Latent Diffusion and Global Copernicus Data
* Mesh Extraction for Unbounded Scenes Using Camera-Aware Octrees
* MeshUp: Multi-Target Mesh Deformation via Blended Score Distillation
* MetricCol: Metric Depth and Pose Estimation in Colonoscopy via Geometric Consistency and Domain Adaptation
* MFSR-GAN: Multi-Frame Super-Resolution with Handheld Motion Modeling
* Minimalist Approach to HDR Image Compression with Applications to Low-Light Image Enhancement, A
* Mipmap-GS: Let Gaussians Deform with Scale-Specific Mipmap for Anti-Aliasing Rendering
* Mix-QSAM: Mixed-Precision Quantization of the Segment Anything Model
* Mixture-of-Shape-Experts (MoSE): End-to-End Shape Dictionary Framework to Prompt SAM for Generalizable Medical Segmentation
* mli-NeRF: Multi-Light Intrinsic-Aware Neural Radiance Fields
* mmDiffusion: mmWave Diffusion for Sequential 3D Human Dense Point Cloud Generation
* MMDrive: Multi-Modal Remote Physiological Signal Measurement Dataset for Driver Status Monitoring
* MObi: Multimodal Object Inpainting Using Diffusion Models
* MobileSACNet: Lightweight Spectral-Spatial Compression for Hyperspectral Segmentation in Autonomous Driving Systems
* MoCLIP: Motion-Aware Fine-Tuning and Distillation of CLIP for Human Motion Generation
* Modality Selection and Skill Segmentation via Cross-Modality Attention
* Modifying Generative Distributions in Latent Diffusion Models to Improve Alignment with Desired Properties
* MoExDA: Domain Adaptation for Edge-based Action Recognition
* Mof-Image: Generating Mixture-of-Features Video Game Image Dataset via GPU Rendering Simulation
* MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training
* Monocular 3D Reconstruction Based on Deep Convolutional Neural Networks
* MonoPatchNeRF: Improving Neural Radiance Fields with Patch-Based Monocular Guidance
* Mopeft: A Mixture-of-Pefts for the Segment Anything Model
* Morphological Correction Method for River Skeleton Lines Based on Sampling Point Offsets
* MorphoSkel3D: Morphological Skeletonization of 3D Point Clouds for Informed Sampling in Object Classification and Retrieval
* MotionDreamer: Exploring Semantic Video Diffusion Features for Zero-Shot 3D Mesh Animation
* MTA-VPS: A Large-Scale Benchmark for Video-Based Person Search
* Mtevent: a Multi-Task Event Camera Dataset for 6D Pose Estimation and Moving Object Detection
* Multi-Agent Systems for Robotic Autonomy with LLMs
* Multi-aspect Knowledge Distillation with Large Language Model
* Multi-Dimensional Quality Assessment for UGC Videos via Modular Multi-Modal Vision-Language Models
* Multi-Entity Video Transformers for Fine-Grained Video Representation Learning
* Multi-Flow: Multi-View-Enriched Normalizing Flows for Industrial Anomaly Detection
* Multi-Layer Radial Basis Function Networks for Out-of-Distribution Detection
* Multi-Modal Cooperative Distillation for Zero-Shot Multi-Label Classification
* Multi-Person Pose Estimation Evaluation Using Optimal Transportation and Improved Pose Matching
* Multi-Scale Information-Driven Rock Classification Algorithm Based on Enhanced ResNet, A
* Multi-Spectral Imaging and Data Fusion for Real-Time Bleeding Detection
* Multimodal 3D Object Detection on Unseen Domains
* Multimodal Emotion Prediction in Interpersonal Videos Integrating Facial and Speech Cues
* Multimodal Generalized Category Discovery
* Multimodal Rationales for Explainable Visual Question Answering
* Multiple Instance Learning for Visual Grain Quality Analysis Without Instance-Level Annotation
* MVA 2025 Small Multi-Object Tracking for Spotting Birds Challenge: Dataset, Methods, and Results
* MVCM: Enhancing Multi-View and Cross-Modality Alignment for Medical Visual Question Answering and Medical Image-Text Retrieval
* Nadirfloornet: Reconstructing Multi-Room Floorplans from a Small Set of Registered Panoramic Images
* Nanoparticle Diameter Measurements with Event Camera Tracking
* Naturally Computed Scale Invariance in the Residual Stream of ResNet18
* Near-Incident Detection in Railroad Environments: Lateral Distance Estimation froM Train-Mounted Monocular Camera
* Neighbor-Based Feature and Index Enhancement for Person Re-Identification
* NeIn: Telling What You Don't Want
* NeuHMR: Neural Rendering-Guided Human Motion Reconstruction
* NeuRadar: Neural Radiance Fields for Automotive Radar Point Clouds
* New Tai Le Character Recognition System Based on Modelarts Platform, The
* Nexar Dashcam Collision Prediction Dataset and Challenge
* NExNet Seg: Neuron Expansion Network for Medical Image Segmentation
* No Train Yet Gain: Towards Generic Multi-Object Tracking in Sports and Beyond
* No-MambAAD: Revitalizing Conv-Only Networks for Unsupervised Anomaly Detection
* Noise Algorithms in Game Terrain Generation
* Noise Consistency Regularization for Improved Subject-Driven Image Synthesis
* Noise-based Regularized Training for Diffusion Models
* NoKSR: Kernel-Free Neural Surface Reconstruction via Point Cloud Serialization
* Novel 3D Decoder with Weighted and Learnable Triple Attention for 3D Microscopy Image Segmentation, A
* NTIRE 2025 Ambient Lighting Normalization Challenge Report
* NTIRE 2025 Challenge on Cross-Domain Few-Shot Object Detection: Methods and Results
* NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results
* NTIRE 2025 Challenge on Efficient Burst HDR and Restoration: Datasets, Methods, and Results
* NTIRE 2025 Challenge on Event-Based Image Deblurring: Methods and Results
* NTIRE 2025 Challenge on HR Depth From Images of Specular and Transparent Surfaces
* NTIRE 2025 Challenge on Image Super-ResolutionX4: Methods and Results
* NTIRE 2025 Challenge on Light Field Image Super-Resolution: Methods and Results
* NTIRE 2025 Challenge on Low Light Image Enhancement: Methods and Results
* NTIRE 2025 Challenge on Night Photography Rendering
* NTIRE 2025 Challenge on Raw Image Restoration and Super-Resolution
* NTIRE 2025 Challenge on Real-World Face Restoration: Methods and Results
* NTIRE 2025 Challenge on Short-Form UGC Video Quality Assessment and Enhancement: KWAISR Dataset and Study
* NTIRE 2025 Challenge on Short-Form UGC Video Quality Assessment and Enhancement: Methods and Results
* NTIRE 2025 Challenge on Single Image Reflection Removal in the Wild: Datasets, Methods and Results
* NTIRE 2025 challenge on Text to Image Generation Model Quality Assessment
* NTIRE 2025 Challenge on UGC Video Enhancement: Methods and Results
* NTIRE 2025 Challenge on Video Quality Enhancement for Video Conferencing: Datasets, Methods and Results
* Ntire 2025 Image Shadow Removal Challenge Report
* NTIRE 2025 the 2nd Restore Any Image Model (RAIM) in the Wild Challenge
* Ntire 2025 XGC Quality Assessment Challenge: Methods and Results
* Obfuscation Based Privacy Preserving Representations Are Recoverable Using Neighborhood Information
* Object Agnostic 3D Lifting in Space and Time
* Object is Worth 64×64 Pixels: Generating 3D Object via Image Diffusion, An
* Object State Recognition in Cooking Videos through End State Frames Analysis
* ObjectCarver: Semi-Automatic Segmentation, Reconstruction and Separation of 3D Objects
* Oblique-MERF: Revisiting and Improving MERF for Oblique Photography
* OccludeNeRF: Geometry-Aware 3D Scene Inpainting with Collaborative Score Distillation in NeRF
* OD-NeRF: Efficient Training of On-the-Fly Dynamic Neural Radiance Fields
* On the Robustness of GUI Grounding Models Against Image Attacks
* On the Suitability of Reinforcement Fine-Tuning to Visual Tasks
* One-to-Many Fine-Grained Matching Between UAV Images and Satellite Images for UAV Self-Localization
* Online 3D Scene Reconstruction Using Neural Object Priors
* Online Gaussian Test-Time Adaptation of Vision-Language Models
* OnlyFlow: Optical Flow Based Motion Conditioning for Video Diffusion Models
* Open Dataset and Enhancement Method for Long-Wave Thermal Diurnal Material Classification
* Open-Vocabulary Semantic Part Segmentation of 3D Human
* OpenSplat3D: Open-Vocabulary 3D Instance Segmentation using Gaussian Splatting
* Opentad: a Unified Framework and Comprehensive Study of Temporal Action Detection
* OpticFusion: Multi-Modal Neural Implicit 3D Reconstruction of Microstructures by Fusing White Light Interferometry and Optical Microscopy
* Optimal Mixture Model Distribution Alignment-Based 3D-2D Gaussian Splatting Registration for Monocular Endoscopic Ar Guidance
* Optimising Vision Transformer Performance on Limited Datasets: A Multi-Gradient Approach
* Out-of-Distribution Detection with Adversarial Outlier Exposure
* Out-of-Distribution Segmentation in Autonomous Driving: Problems and State of the Art
* Outlier-Robust Multi-Model Fitting on Quantum Annealers
* Overview of the 1st International Workshop on Interactive Video Search and Exploration
* P2P-NET: A PSO-Vision Framework for Accurate Detection and Multi-Class Classification of Parasitic Eggs in Human and Animal in Microscopy Images
* Pan-Rsvqa: Vision Foundation Models as Pseudo-Annotators for Remote Sensing Visual Question Answering
* PanoDreamer: Consistent Text to 360-Degree Scene Generation
* Panopticon: Advancing Any-Sensor Foundation Models for Earth Observation
* Para-Lane: Multi-Lane Dataset Registering Parallel Scans for Benchmarking Novel View Synthesis
* Parallel Sampling of Diffusion Models on SO(3)
* Particle Rendering: Implicitly Aggregating Incident and Outgoing Light Fields for Novel View Synthesis
* PartStickers: Generating Parts of Objects for Rapid Prototyping
* PaSTe: Improving the Efficiency of Visual Anomaly Detection at the Edge
* PatchContrast: Self-Supervised Pre-Training for 3D Object Detection
* PCBEAR: Pose Concept Bottleneck for Explainable Action Recognition
* Perturbed State Space Feature Encoders for Optical Flow with Event Cameras
* PETAH: Parameter Efficient Task Adaptation for Hybrid Transformers
* PF3Det: A Prompted Foundation Feature Assisted Visual Lidar 3D Detector
* Physical-Model-Guided Dual-Branch Generative Adversarial Network for Thin Cloud Removal
* Physics-Based Human Pose Estimation from a Single Moving RGB Camera
* PhysNav-DG: A Novel Adaptive Framework for Robust VLM-Sensor Fusion in Navigation Applications
* PhytoSynth: Leveraging Multi-modal Generative Models for Crop Disease Data Generation with Novel Benchmarking and Prompt Engineering Approach
* PiCaZo: Pixel-Aligned Contrastive Learning for Zero-Shot Domain Adaptation
* PineSORT: A Simple Online Real-Time Tracking Framework for Drone Videos in Agriculture
* PIR: Photometric Inverse Rendering with Shading Cues Modeling and Surface Reflectance Regularization
* Plenoptic PNG: Real-Time Neural Radiance Fields in 150 KB
* PLVM: A Tuning-Free Approach for Personalized Large Vision-Language Model
* PlückeRF: A Line-Based 3D Representation for Few-View Reconstruction
* Point Cloud Edge Extraction Based on 3D Point Separability Filter with Spherical Mask
* Point Cloud Stitching Approach Based on Image Registration for High-Precision Threaded Surface Modeling in Multi-View 3D Imaging, A
* Polar Coordinate-Based 2D Pose Prior with Neural Distance Field
* Pose-Aware Weakly-Supervised Action Segmentation
* Pose-to-Pose: A New Task and Benchmark for Human Pose Transition in Yoga
* PoseGuru: Landmarks for Explainable Pose Correction using Exemplar-Guided Algorithmic Recourse
* PoseSynVIT: Lightweight and Scalable Vision Transformers for Human Pose Estimation
* Power of Augmentations in IR Object Detection, The
* PPTracker: Tracking UAV Swarms with Prior Prompt
* Pre-Manipulation Alignment Prediction for Open-Vocabulary Object Manipulation Based on End-Effector Trajectories
* Predicting Butterfly Species Presence from Satellite Imagery Using Soft Contrastive Regularisation
* PRIMEDrive-CoT: A Precognitive Chain-of-Thought Framework for Uncertainty-Aware Object Interaction in Driving Scene Scenario
* Privacy Preserving Ordinal-Meta Learning with VLMs for Fine-Grained Fruit Quality Prediction
* Probabilistic Online Event Downsampling
* Probabilistic Perspective-n-Lines for Indoor Camera Pose Estimation
* Probing Vulnerabilities of Vision-Lidar Based Autonomous Driving Systems
* Proc-Gs: Procedural Building Generation for City Assembly with 3D Gaussians
* Progressive Autoregressive Video Diffusion Models
* Prompt Categories Cluster for Weakly Supervised Semantic Segmentation
* Prompt the Missing: Efficient and Robust Audio-Visual Classification Under Uncertain Modalities
* Prompt-Guided Attention Head Selection for Focus-Oriented Image Retrieval
* Prompt-Tuning SAM: From Generalist to Specialist with Only 2,048 Parameters and 16 Training Images
* PromptNorm: Image Geometry Guides Ambient Light Normalization
* ProtoPatchNet: An Interpretable Patch-Based Prototypical Network
* Prototype-Based Continual Learning with Label-Free Replay Buffer and Cluster Preservation Loss
* Prototype-Guided Diffusion for Digital Pathology: Achieving Foundation Model Performance with Minimal Clinical Data
* PS4PRO: Pixel-to-Pixel Supervision for Photorealistic Rendering and Optimization
* Pseudo-Labelling Meets Label Smoothing for Noisy Partial Label Learning
* PUF-Assisted Lightweight Mutual Authentication of Low-Cost RFID Tags for Medical Privacy Preservation, A
* Pureformer: Transformer-Based Image Denoising
* Pushing the Limits of LiDAR: Accurate Performance Analysis of Indoor 3D LiDARs
* PVUW 2025 Challenge Report: Advances in Pixel-Level Understanding of Complex Videos in the Wild
* Q-CIDNet: Perceptual Quality Aware Color and Intensity Decoupling Network for Video Quality Enhancement
* QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-Free Visual Document Understanding
* Quadrocular, Neuromorphic Stereo Triangulation and Asynchronous Data Fusion for 3D Object Tracking
* Quality Assessment for Talking Head Videos via Multi-Modal Feature Representation
* Quantized Image Super-Resolution on Mobile NPUs, Mobile AI 2025 Challenge: Report
* Quantum Federated Learning for Multimodal Data: A Modality-Agnostic Approach
* RAD: Retrieval-Augmented Decision-Making of Meta-Actions with Vision-Language Models in Autonomous Driving
* RADLER: Radar Object Detection Leveraging Semantic 3D City Models and Self-Supervised Radar-Image Learning
* RadSplat: Radiance Field-Informed Gaussian Splatting for Robust Real- Time Rendering with 900+ FPS
* Raw Image Reconstruction From RGB on Smartphones. NTIRE 2025 Challenge Report
* Read My Ears! Horse Ear Movement Detection for Equine Affective State Assessment
* Reading in the Dark with Foveated Event Vision
* Real-Time Detection Method for Surface Defects in 3D Printing Based on YOLOv12 Algorithm, A
* Real-Time Fire Detection Using Hybrid Feature Extraction: Color, Texture, and Motion Analysis
* Real-Time Pedestrian Detection at the Edge on a Fully Asynchronous Neuromorphic System
* Real-Time Simulation of Destructible Objects: From Rigid Fractures to Soft-Body Deformation
* Real-Time Ultra-Fine-Grained Surgical Instrument Classification
* RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion
* Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model
* ReasonDrive: Efficient Visual Question Answering for Autonomous Vehicles with Reasoning-Enhanced Small Vision-Language Models
* Recursive Multi-Exposure Alignment with Spatiotemporal Decoupling for Efficient Burst HDR and Restoration
* REEF: Relevance-Aware and Efficient LLM Adapter for Video Understanding
* Reference Segmentation Network Based on Feature Interaction Enhancement
* ReferGPT: Towards Zero-Shot Referring Multi-Object Tracking
* Reflecting Reality: Enabling Diffusion Models to Produce Faithful Mirror Reflections
* REJEPA: A Novel Joint-Embedding Predictive Architecture for Efficient Remote Sensing Image Retrieval
* Rel-SA: Alzheimer's Disease Detection Using Relevance-augmented Self Attention by Inducing Domain Priors in Vision Transformers
* RepFC: Universal Structural Reparametrization Block for High Performance, Lightweight Deep Neural Networks
* Repurposing SAM for User-Defined Semantics Aware Segmentation
* Research on an Intelligent Security Door Passenger Flow Statistics System Based on an Improved Deep Learning Human Body Recognition Algorithm
* Research on Defect Detection of Wire Rope of Mine Hoist Based on Feature Embedding
* Research on Semantic Communication Based on Balancing of Task Distortion
* Research on Ultrasound Image Feature Enhancement Based on Frequency-Domain Self-Attention
* Rethinking Compressive Sensing: A Compression Framework for Video Super-Resolution
* Rethinking the Role of Spatial Mixing
* Retinex-Guided Histogram Transformer for Mask-Free Shadow Removal
* Reversible Grayscale Method Based on Bit-Field Multi-Channel Fusion Encoding, A
* Revisiting Multi-Modal LLM Evaluation
* Revisiting Referring Expression Comprehension Evaluation in the Era of Large Multimodal Models
* Revisiting Self-Generating Simple Figure Patterns for Learning Microscopy Image Segmentation
* Revolutionizing Drug Discovery: Integrating Spatial Transcriptomics with Advanced Computer Vision Techniques
* RGB Photo Enhancement on Mobile GPUs, Mobile Ai 2025 Challenge: Report
* RGB-Thermal Cooperative Robot Vision Strategy for Multi-Person Tracking in Both Well-Lit and Low-Light Scenes
* Rig3DGS: Creating Controllable Portraits From Casual Monocular Videos
* Rigid Body Adversarial Attacks
* RISE-SDF: A Relightable Information-Shared Signed Distance Field for Glossy Object Inverse Rendering
* Robust 3D Watermarking Method Based on Statistical Features, A
* Robust 6DoF Pose Estimation Against Depth Noise and a Comprehensive Evaluation on a Mobile Dataset
* Robust AD: A Real World Benchmark Dataset for Robustness in Industrial Anomaly Detection
* Robust Stage-Wise LVLM Adaptation: Multi-Phase Prompt Lora Fine-Tuning for Compound Expression Recognition
* Robust Translation Synchronization Algorithm, A
* Robustifying Point Cloud Networks by Refocusing
* Robustness Evaluation for Video Models with Reinforcement Learning
* Robusto-1 Dataset: Comparing Humans and VLMs on Real Out-Of-Distribution Autonomous Driving VQA from Peru
* S-Band SAR Target Classification Via 2D and 3D Deep Learning Methods
* S-EO: A Large-Scale Dataset for Geometry-Aware Shadow Detection in Remote Sensing Applications
* s2p-hd: GPU-Accelerated Binocular Stereo Pipeline for Large-Scale Same-Date Stereo
* Safe-Construct: Redefining Construction Safety Violation Recognition as 3D Multi-View Engagement Task
* SAGA: Semantic-Aware Gray Color Augmentation for Visible-to-Thermal Domain Adaptation Across Multi-View Drone and Ground-Based Vision Systems
* Salient Object Detection with Dynamic Convolutions
* SAM4EM: Efficient Memory-Based Two Stage Prompt-Free Segment Anything Model Adapter for Complex 3D Neuroscience Electron Microscopy Stacks
* SAMJAM: Zero-Shot Video Scene Graph Generation for Egocentric Kitchen Videos
* SAMPro3D: Locating SAM Prompts in 3D for Zero-Shot Instance Segmentation
* SARFormer - An Acquisition Parameter Aware Vision Transformer for Synthetic Aperture Radar Data
* SBAS-InSAR Phase Unwrapping Method Integrating ICU and SNAPHU: A Case Study of Dalian City, An
* SBFS-Net: Smoke Segmentation via Separated Smoke and Background Features
* SC-NeRF: NeRF-Based Point Cloud Reconstruction Using a Stationary Camera for Agricultural Applications
* SCAF-YOLO: Multi-Scale Feature Fusion for Small Object Detection in Remote Sensing Images
* Scale-Invariant Implicit Neural Representations for Object Counting
* Scaling Laws in Zero-Shot Gender Classification Using CLIP
* Scaling On-Device GPU Inference for Large Generative Models
* Scaling Test-Time Compute can Outperform Larger Architectures in Computer Vision
* Scene Recognition Meets Knowledge Graphs: Enhancing Robustness to Object Diversity
* Scene-Specific Anomalous Relationship Detection Using Scene Graph Summarization
* SceneMotifCoder: Example-Driven Visual Program Learning for Generating 3D Object Arrangements
* ScoreCAM++: Gated Score-Weighted Visual Explanations for CNNs
* Secret Point Recognition Algorithm via Test-Time Augmentation Based on Large Language Models
* Securing the Skies: a Comprehensive Survey on Anti-Uav Methods, Benchmarking, and Future Directions
* Seeing like a Cephalopod: Colour Vision with a Monochrome Event Camera
* Segment Any Primitive: Zero-Shot 3D Primitive Segmentation from Point Cloud
* Segment AnyNeuron
* Selective Test-Time Domain Adaptation Using Fisher Information for Robust Facial Expression Recognition In-the-Wild
* Self-supervised 3D Image Deburring for Lattice Light Sheet Microscopy
* Self-Supervised Pretraining for Fine-Grained Plankton Recognition
* Semantic Matters: Multimodal Features for Affective Analysis
* Semantic Segmentation of iPS Cells: Case Study on Model Complexity in Biomedical Imaging
* Semantic-Aware Local Image Editing with a Single Mask Operation
* SemanticSugarBeets: A Multi-Task Framework and Dataset for Inspecting Harvest and Storage Characteristics of Sugar Beets
* Semi-Self-Supervised Approach for Dense-Pattern Video Object Segmentation, A
* Semi-Supervised Object-Wise Anomaly Detection for Firearm and Firearm Component Detection in X-Ray Security Imagery
* Sensor Agnostic Domain Generalization Framework for Leveraging Geospatial Foundation Models: Enhancing Semantic Segmentation via Synergistic Pseudo-Labeling and Generative Learning, A
* Separating Shared and Domain-Specific LoRAs for Multi-Domain Learning
* ShadowAug: A Multi-Strategy Data Augmentation Method for Image Shadow Removal
* ShadowSG: Spherical Gaussian Illumination from Shadows
* Shopformer: Transformer-Based Framework for Detecting Shoplifting via Human Pose
* Short-Term 3D Human Mesh Recovery with Virtual Markers Disentanglement
* Show or Tell? A Benchmark to Evaluate Visual and Textual Prompts in Semantic Segmentation
* SILK: Smooth InterpoLation Framework for Motion in-Betweening a Simplified Computational Approach
* SilVar-Med: A Speech-Driven Visual Language Model for Explainable Abnormality Detection in Medical Imaging
* SimCache: Similarity Caching for Efficient VLM-based Scene Understanding
* Simple Combination of Diffusion Models for Better Quality Trade-Offs in Image Denoising, A
* Simple Detector with Frame Dynamics is a Strong Tracker, A
* Simple Yet Effective Way to Use Polarimetric Information in Stereo Matching
* Single-Stage Uncertainty-Aware Jersey Number Recognition in Soccer
* SK-RD4AD: Skip-Connected Reverse Distillation for Robust One-Class Anomaly Detection
* Skin Lesion Classification using Dermoscopic Images and Clinical Metadata: Insights from Multimodal Models
* Skor-Xg: Skeleton-Oriented Expected Goal Estimation in Soccer
* SkyCloud360: Sky and Cloud Segmentation in Equirectangular Images
* Slot Attention-Based Feature Filtering for Few-Shot Learning
* SLRTP2025 Sign Language Production Challenge: Methodology, Results, and Future Work
* Smallgs: Gaussian Splatting-Based Camera Pose Estimation for Small-Baseline Videos
* SmartHome-Bench: A Comprehensive Benchmark for Video Anomaly Detection in Smart Homes Using Multi-Modal Large Language Models
* Smooth Cache: A Universal Inference Acceleration Technique for Diffusion Transformers
* SMORE: Simultaneous Map and Object REconstruction
* Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces
* Snapshot Hyperspectral Imaging using Petrographic Thin Section
* SoccerNet-v3D: Leveraging Sports Broadcast Replays for 3D Scene Understanding
* SoyStageNet: Balancing Accuracy and Efficiency for Real-Time Soybean Growth Stage Detection
* SPAFormer: Sequential 3D Part Assembly with Transformers
* Sparsegs: Sparse View Synthesis Using 3D Gaussian Splatting
* Spatial Cognition from Egocentric Video: Out of Sight, Not Out of Mind
* Spatio-Temporal State Space Model for Efficient Event-Based Optical Flow
* Spectro-Textural Integration in Mangrove Delineation: A Case Analysis of Aboitiz Cleanergy Park, Davao City, Philippines
* SphereFusion: Efficient Panorama Depth Estimation via Gated Fusion
* SPIdepth: Strengthened Pose Information for Self-Supervised Monocular Depth Estimation
* Splat-SLAM: Globally Optimized RGB-Only SLAM with 3D Gaussians
* SplatMesh: Interactive 3D Segmentation and Editing Using Mesh-Based Gaussian Splatting
* SplatTouch: Explicit 3D Representation Binding Vision and Touch
* Sporadic Federated Learning Approach in Quantum Environment to Tackle Quantum Noise
* Sport Field Calibration with NeRF-Guided Camera Optimization from a Single Image
* SportMamba: Adaptive Non-Linear Multi-Object Tracking with State Space Models for Team Sports
* Spurfies: Sparse-View Surface Reconstruction Using Local Geometry Priors
* SRVP: Strong Recollection Video Prediction Model Using Attention-Based Spatiotemporal Correlation Fusion
* SSL4Eco: A Global Seasonal Dataset for Geospatial Foundation Models in Ecology
* SSRFlow: Semantic-Aware Fusion with Spatial Temporal Re-Embedding for Real-World Scene Flow
* STAM: Zero-Shot Style Transfer Using Diffusion Model via Attention Modulation
* STAPLE: Siamese Transformer Assisted Pseudo Label Ensembling for Unsupervised Domain Adaptation in No-Reference IQA
* Statistic Temporal Checking and Depth Layering based Multi-Object Relative Size Estimation from Monocular Video
* Stochastic-Based Patch Filtering for Few-Shot Learning
* Stock Price Prediction and Investment Strategy via Machine Learning Model Fusion
* Stokes-S0 Prior-Guided Dual-Branch Network for Polarized Image Enhancement
* Strong Baseline for Multi-Person Tracking in Thermal Infrared Imagery, A
* Strong Baseline: Multi-UAV Tracking via YOLOv12 with BoT-SORT-ReID
* StrongSiamTracker: A Siamese Tracker with Dynamic Global Detection for Robust Anti-UAV Tracking
* STRRNet: Semantics-Guided Two-Stage Raindrop Removal Network
* Studying Image Diffusion Features for Zero-Shot Video Object Segmentation
* Style-Preserving Diffusion for Scene Text Editing
* Supervised Domain Adaptation from Scene Text Recognition for Licence Plate Recognition
* Surface Defect Detection of Chip Images Based on the Improved FCOS with SENet, The
* SurfR: Surface Reconstruction with Multi-Scale Attention
* Surprising Utility of Group Partitioning in Improving Conformal Prediction of Visual Classifiers Under Distributional Shifts, The
* Survey of State of the Art Large Vision Language Models: Alignment, Benchmark, Evaluations and Challenges, A
* SVAD: From Single Image to 3D Avatar via Synthetic Data Generation with Video Diffusion and Data Augmentation
* SwarmDiff: Swarm Robotic Trajectory Planning in Cluttered Environments via Diffusion Transformer
* SwinPaste: A Swin Transformer-Based Framework for RGB-Guided Thermal Image Super-Resolution
* Syn3DTxt: Embedding 3D Cues for Scene Text Generation
* Synthesizing Consistent Novel Views Via 3D Epipolar Attention Without Re-Training
* Synthetic Data Augmentation using Pre-trained Diffusion Models for Long-tailed Food Image Classification
* Synthetic Dataset for Group Activity Recognition
* T-SAM: Transductive Learning for Segment Anything Model
* Talk2Traffic: Interactive and Editable Traffic Scenario Generation for Autonomous Driving with Multimodal Large Language Model
* Task-Agnostic Attacks Against Vision Foundation Models
* Task-Conditioned Ensemble of Expert Models for Continuous Learning
* Task-Informed Meta-Learning for Remote Sensing
* Task-Level Contrastiveness for Cross-Domain Few-Shot Learning
* TB-Bench: Training and Testing Multi-Modal AI for Understanding Spatio-Temporal Traffic Behaviors from Dashcam Images/Videos
* TEDRA: Text-Based Editing of Dynamic and Photoreal Actors
* Temporal Conditioning for Realistic Performance Video Generation from Instrumental Sounds
* Temporal Consistent Semantic Video Color Transfer from Multiple References
* Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report, The
* Tenth NTIRE 2025 Image Denoising Challenge Report, The
* Terramesh: A Planetary Mosaic of Multimodal Earth Observation Data
* Text-Guided Patch Scoring and Local Distortion Guidance for Image Quality Assessment
* Textinvision: Text and Prompt Complexity Driven Visual Text Generation Benchmark
* Texture2LoD3: Enabling LoD3 Building Reconstruction with Panoramic Images
* Thermal Image Super-Resolution Challenge Results - PBVS 2025
* Thermal Pedestrian Multiple Object Tracking Challenge (TP-MOT)
* TLAC: Two-Stage LMM Augmented CLIP for Zero-Shot Classification
* To Match or Not to Match: Revisiting Image Matching for Reliable Visual Place Recognition
* ToF-360 - A Panoramic Time-of-Flight RGB-D Dataset for Single Capture Indoor Semantic 3D Reconstruction
* Tokenfocus-VQA: Enhancing Text-to-Image Alignment with Position-Aware Focus and Multi-Perspective Aggregations on LVLMs
* Toward Automation in Text-Based Video Retrieval with LLM Assistance
* Towards Ball Spin and Trajectory Analysis in Table Tennis Broadcast Videos via Physically Grounded Synthetic-to-Real Transfer
* Towards Efficient and Robust Moment Retrieval System: A Unified Framework for Multi-Granularity Models and Temporal Reranking
* Towards Efficient Benchmarking of Foundation Models in Remote Sensing: A Capabilities Encoding Approach
* Towards Evaluating the Robustness of Visual State Space Models
* Towards Exploring Continual Learning for Toxicologic Pathology in Pharmaceutical Drug Discovery
* Towards Faster and More Compact Foundation Models for Molecular Property Prediction
* Towards Fine-Grained Spatial Control for Soccer Game Image Generation
* Towards Foundation Models for 3D Vision: How Close are We?
* Towards Holistic Visual Quality Assessment of AI-Generated Videos: A LLM-Based Multi-Dimensional Evaluation Model
* Towards Low-Latency Event-Based Obstacle Avoidance on a FPGA-Drone
* Towards Robust Multimodal AU Detection: STN-Enhanced Visual Encoding and Audio-Visual Spatial-Temporal Alignment
* Towards Scale-Aware Low-Light Enhancement Via Structure-Guided Transformer Design
* Towards Synthetic Concept Activation Vectors via Generative Models
* Towards Trustworthy Autonomous Vehicles with Vision-Language Models under Targeted and Untargeted Adversarial Attacks
* Towards Unconstrained 2D Pose Estimation of the Human Spine
* Traffic Sign Recognition Under Visual Perturbations: Shadows, Light Patches, and Simulated Obstructions
* Train-Borne Video Intelligent Solution for High-Speed Railway Infrastructure Inspection, A
* Training Data Reconstruction: Privacy Due to Uncertainty?
* Training Neural Networks on RAW and HDR Images for Restoration Tasks
* Training-Free Color-Style Disentanglement for Constrained Text-to-Image Synthesis
* TrajGNAS: Heterogeneous Multiagent Trajectory Prediction Based on a Graph Neural Architecture Search
* Transformer-Based Lung Infection Severity Prediction with Cross Attention and Conditional TransMix Augmentation
* Transformer-based Visual Grounding with Inter-Modality Cross Attention
* Tree-Structure Transformer for Skeleton-Based Human Action Recognition
* Trishul: Towards Region Identification and Screen Hierarchy Understanding for Large VLM Based GUI Agents
* True Hyperspectral Image Super-Resolution Dataset, A
* Trustworthy Multi-UAV Collaboration: A Self-Supervised Framework for Explainable and Adversarially Robust Decision-Making
* TT3D: Table Tennis 3D Reconstruction
* TTGen: Incorporating Test-Time Scaling to Diffusion Models
* TTT-KD: Test-Time Training for 3D Semantic Segmentation Through Knowledge Distillation From Foundation Models
* Turin3D: Evaluating Adaptation Strategies Under Label Scarcity in Urban Lidar Segmentation with Semi-Supervised Techniques
* Two Views are Better Than One: Monocular 3D Pose Estimation with Multiview Consistency
* U-ARE-ME: Uncertainty-Aware Rotation Estimation in Manhattan Environments
* U-Shape Mamba: State Space Model for Faster Diffusion
* Uncertainty Aware Training to Improve Uncertainty Active Learning for Semantic Segmentation
* Uncertainty Quantification for Gradient-Based Explanations in Neural Networks
* Uncertainty-Guided Style-Aware Probabilistic Perceptual Quality Assessment for AI-Generated Images
* Uncovering Branch Specialization in InceptionV1 Using K Sparse Autoencoders
* Understanding and Mitigating Toxicity in Image-Text Pretraining Datasets: A Case Study on LLaVA
* Understanding Depth and Height Perception in Large Visual-Language Models
* Understanding the Effect of using Semantically Meaningful Tokens for Visual Representation Learning
* Unimotion: Unifying 3D Human Motion Synthesis and Understanding
* UNIT: Unsupervised Online Instance Segmentation Through Time
* United We Stand, Divided We Fall: Handling Weak Complementarity for Audio-Visual Emotion Recognition in Valence-Arousal Space
* UniToken: Harmonizing Multimodal Understanding and Generation Through Unified Visual Encoding
* Universal Shape of Strong Remote Adversarial Patches for Object Detection with Convolutional Neural Networks
* Unsupervised 3D Braided Hair Reconstruction from a Single-View Image
* Unveiling Histopathological Features of Breast Cancers Using Limited Data
* UPPET: Unified Pedestrian Pose Estimation in Thermal Imaging
* UrbanIR: Large-Scale Urban Scene Inverse Rendering from a Single Video
* V-NAW: Video-Based Noise-Aware Adaptive Weighting for Facial Expression Recognition
* V3LMA: Visual 3D-Enhanced Language Model for Autonomous Driving
* Vehicle Accident Detection in Video Surveillance Based on BiFPN-YOLOv8
* Vehicle Detection Under Complex Weather Conditions Based on an Adaptive Model
* Very Similar Appearance Feature Classification for Chronic Endometritis Diagnosis in Hysteroscopy Images
* Video, How do Your Tokens Merge?
* ViDROP: Video Dense Representation Through Spatio-Temporal Sparsity
* Viewpoint-Aware 3D Dense Captioning
* Virtual Pose Coach: A Motion-Retargeting Approach for Pose Training
* Visible-Infrared Person Re-Identification with Modality-Specific Expert
* Vision Language Models for Massive MIMO Semantic Communication
* VisionCube: 3D-Aware Vision-Language Model for Multi-Step Spatial Reasoning
* ViSkin: Physics-Based Simulation of Virtual Skin on Personalized Avatars
* VISTA-CLIP: Visual Incremental Self-Tuned Adaptation for Efficient Continual Panoptic Segmentation
* Visual Question Answering on Multiple Remote Sensing Image Modalities
* Visual RAG Pipeline for Few-Shot Fine-Grained Product Classification, A
* Visualizing and Controlling Cortical Responses Using Voxel-Weighted Activation Maximization
* Visually Interpretable Subtask Reasoning for Visual Question Answering
* Vit4v: a Video Classification Method for the Detection of Varroa Destructor from Honeybees
* VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D Hybrid Prior
* VNL-STES: A Benchmark Dataset and Model for Spatiotemporal Event Spotting in Volleyball Analytics
* Vocabulary-Free 3D Instance Segmentation with Vision-Language Assistant
* Vocabulary-Free Few-Shot Learning for Vision-Language Models
* VolTex: Food Volume Estimation Using Text-Guided Segmentation and Neural Surface Reconstruction
* Volume Measurement Technology of Dispensing Transparent Adhesives Based on Line Laser Scanning
* VRAG: Retrieval-Augmented Video Question Answering for Long-Form Videos
* VRU-CIPI: Crossing Intention Prediction at Intersections for Improving Vulnerable Road Users Safety
* VXP: Voxel-Cross-Pixel Large-Scale Camera-LiDAR Place Recognition
* WaterSplatting: Fast Underwater 3D Scene Reconstruction Using Gaussian Splatting
* WaveDIF: Wavelet Sub-Band based Deepfake Identification in Frequency Domain
* Wavelet-Based Mechanistic Interpretability of Vision Transformers via Frequency-Aware Ablations
* Way Up: a Dataset for Hold Usage Detection in Sport Climbing, The
* Weakly Supervised Panoptic Segmentation for Defect-Based Grading of Fresh Produce
* What is the Added Value of UDA in the VFM Era?
* What Makes for a Good Stereoscopic Image?
* Wheat3DGS: In-Field 3D Reconstruction, Instance Segmentation and Phenotyping of Wheat Heads with Gaussian Splatting
* When Textures Deceive: Weakly Supervised Industrial Anomaly Detection with Adapted-Loss CycleGAN
* Where Is the Ball: 3D Ball Trajectory Estimation From 2D Monocular Tracking
* Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language Models
* WildlifeReID-10k: Wildlife Re-Identification Dataset with 10k Individual Animals
* Window Token Concatenation for Efficient Visual Large Language Models
* WQLCP: Weighted Adaptive Conformal Prediction for Robust Uncertainty Quantification Under Distribution Shifts
* X-Edit: Detecting and Localizing Edits in Images Altered by Text-Guided Diffusion Models
* XiEff Representation for Interpretable Near-Field Imaging
* XLD: A Cross-Lane Dataset for Benchmarking Novel Driving View Synthesis
* XYScanNet: A State Space Model for Single Image Deblurring
* YOLOv8-GAD: A Lightweight Model for Wheat Ear Counting in Field for UAV Edge Computing
* Z-SASLM: Zero-Shot Style-Aligned SLI Blending Latent Manipulation
* ZECO: ZeroFusion Guided 3D MRI Conditional Generation
* Zero-Shot Denoising for Fluorescence Lifetime Imaging Microscopy with Intensity-Guided Learning
* Zero-Shot Object Detection with Knowledge Enhancement Via Dual-Branch Subgraph Reasoning
* ZeroPS: High-Quality Cross-Modal Knowledge Transfer for Zero-Shot 3D Part Segmentation
* Zfusion: an Effective Fuser of Camera and 4D Radar for 3D Object Perception in Autonomous Driving
1013 for 2512