Keith Price Bibliography update Details for 2508

Update Dates 2508

2508 * *CVPR
* 2DMamba: Efficient State Space Model for Image Representation with Applications on Giga-Pixel Whole Slide Image Classification
* 3D Convex Splatting: Radiance Field Rendering with 3D Smooth Convexes
* 3D Dental Model Segmentation with Geometrical Boundary Preserving
* 3D Gaussian Head Avatars with Expressive Dynamic Appearances by Compact Tensorial Representations
* 3D Gaussian Inpainting with Depth-Guided Cross-View Consistency
* 3D Memristive Cubic Map With Dual Discrete Memristors: Design, Implementation, and Application in Image Encryption, A
* 3D Occupancy Prediction with Low-Resolution Queries via Prototype-aware View Transformation
* 3D Prior is All You Need: Cross-Task Few-shot 2D Gaze Estimation
* 3D Student Splatting and Scooping
* 3D-AVS: LiDAR-based 3D Auto-Vocabulary Segmentation
* 3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination
* 3D-GSW: 3D Gaussian Splatting for Robust Watermarking
* 3D-HGS: 3D Half-Gaussian Splatting*
* 3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer
* 3D-Mem: 3D Scene Memory for Embodied Exploration and Reasoning
* 3D-MVP: 3D Multiview Pretraining for Manipulation
* 3D-SLNR: A Super Lightweight Neural Representation for Large-scale 3D Mapping
* 3DEnhancer: Consistent Multi-View Diffusion for 3D Enhancement
* 3DGUT: Enabling Distorted Cameras and Secondary Rays in Gaussian Splatting
* 3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion
* 4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models
* 4D-Fly: Fast 4D Reconstruction from a Single Monocular Video
* 4Deform: Neural Surface Deformation for Robust Shape Interpolation
* 4DGC: Rate-Aware 4D Gaussian Compression for Efficient Streamable Free-Viewpoint Video
* 4DTAM: Non-Rigid Tracking and Mapping via Dynamic Surface Gaussians
* 4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion
* 5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition Tasks
* 8K@120fps Advanced Entropy Coding Hardware Design for AVS3, An
* A3: Few-shot Prompt Learning of Unlearnable Examples with Cross-Modal Adversarial Feature Alignment
* A4A: Adapter for Adapter Transfer via All-for-All Mapping for Cross-Architecture Models
* AA-CLIP: Enhancing Zero-Shot Anomaly Detection via Anomaly-Aware CLIP
* ABBSPO: Adaptive Bounding Box Scaling and Symmetric Prior based Orientation Prediction for Detecting Aerial Image Objects
* ABC-Former: Auxiliary Bimodal Cross-domain Transformer with Interactive Channel Attention for White Balance
* Aberration-Aware Depth-From-Focus
* AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers
* ACAttack: Adaptive Cross Attacking RGB-T Tracker via Multi-Modal Response Decoupling
* Acc3D: Accelerating Single Image to 3D Diffusion Models via Edge Consistency Guided Score Distillation
* Accelerating Diffusion Transformer via Increment-Calibrated Caching with Channel-Aware Singular Value Decomposition
* Accelerating Multimodal Large Language Models by Searching Optimal Vision Token Reduction
* Accuracy assessment of PlanetScope SuperDove products for aquatic reflectance retrieval over Brazilian inland and coastal waters
* Accurate Differential Operators for Hybrid Neural Fields
* Accurate Extraction of Rural Residential Buildings in Alpine Mountainous Areas by Combining Shadow Processing with FF-SwinT
* Accurate LiDAR-Inertial SLAM Based on Multi-Category Feature Extraction and Matching, An
* Accurate Scene Text Recognition with Efficient Model Scaling and Cloze Self-Distillation
* ACE: Anti-Editing Concept Erasure in Text-to-Image Models
* Achieving Plasticity-Stability Trade-Off in Continual Learning Through Adaptive Orthogonal Projection
* ACL: Activating Capability of Linear Attention for Image Restoration
* ACLC-Detection: A Network for Remote Sensing Image Detection Based on Attention Mechanism and Lightweight Convolution
* Acquire and then Adapt: Squeezing out Text-to-Image Model for Image Restoration
* Across-Beam Signal Integration Approach with Ubiquitous Digital Array Radar for High-Speed Target Detection
* Action Detail Matters: Refining Video Recognition with Local Action Queries
* Activating Sparse Part Concepts for 3D Class Incremental Learning
* Active Contour Model Driven by Non-Local Feature Fitting Energy Function With Scalable Normalization
* Active Data Curation Effectively Distills Large-Scale Multimodal Models
* Active Event-based Stereo Vision
* Active Hyperspectral Imaging Using an Event Camera
* ActiveGAMER: Active GAussian Mapping through Efficient Rendering
* AdaAugment: A Tuning-Free and Adaptive Approach to Enhance Data Augmentation
* AdaCM2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction
* AdaDARE-y: Balancing Stability and Plasticity in Multi-modal LLMs through Efficient Adaptation
* AdaGCL+: An Adaptive Subgraph Contrastive Learning Toward Tackling Topological Bias
* AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization
* AdaptCMVC: Robust Adaption to Incremental Views in Continual Multi-view Clustering
* Adapter Merging with Centroid Prototype Mapping for Scalable Class-Incremental Learning
* Adapting Dense Matching for Homography Estimation with Grid-Based Acceleration
* Adapting Generic RGB-D Salient Object Detection for Specific Traffic Scenarios
* Adapting Pre-trained 3D Models for Point Cloud Video Understanding via Cross-frame Spatio-temporal Perception
* Adapting Text-to-Image Generation with Feature Difference Instruction for Generic Image Restoration
* Adapting to Observation Length of Trajectory Prediction via Contrastive Learning
* Adapting to the Unknown: Training-Free Audio-Visual Event Perception with Dynamic Thresholds
* Adaptive Array Shape Estimation and High-Resolution Sensing for AUV-Towed Linear Array Sonar During Turns
* Adaptive CNN-Based Approach for Improving SWOT-Derived Sea-Level Observations Using Drifter Velocities, An
* Adaptive Dropout: Unleashing Dropout across Layers for Generalizable Image Super-Resolution
* Adaptive Event-Triggered Optimal Tracking Control for Wheeled Mobile Robots Considering Force-Velocity Hybrid Constraints
* Adaptive Keyframe Sampling for Long Video Understanding
* Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding
* Adaptive Multi-Resolution Feature Fusion for Fine-Grained Visual Classification
* Adaptive near Real-Time RFI Mitigation Using Karhunen-Loève Transform
* Adaptive Non-uniform Timestep Sampling for Accelerating Diffusion Model Training
* Adaptive Parameter Selection for Tuning Vision-Language Models
* Adaptive Part Learning for Fine-Grained Generalized Category Discovery: A Plug-and-Play Enhancement
* Adaptive Rectangular Convolution for Remote Sensing Pansharpening
* Adaptive Searching Range-Based Data Association for Multi-Object Tracking With Multi-Information Fusion
* Adaptive SVD-Based Approach to Clutter Suppression for Slow-Moving Targets, An
* Adaptive Trajectory Correction for Underwater Object Tracking
* Adaptive Unimodal Regulation for Balanced Multimodal Information Acquisition
* Adaptive water body detection: Integrating deep learning, normalised difference water index, and vector data for farm dam water monitoring with OmniWaterMask
* ADD: Attribution-Driven Data Augmentation Framework for Boosting Image Super-Resolution
* ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning
* AdMiT: Adaptive Multi-Source Tuning in Dynamic Environments
* ADU: Adaptive Detection of Unknown Categories in Black-Box Domain Adaptation
* Adv-Cpg: A Customized Portrait Generation Framework with Facial Adversarial Attacks
* Advanced Discriminative Co-Saliency and Background Mining Transformer for Co-Salient Object Detection
* Advances in 3D Neural Stylization: A Survey
* Advancing Adversarial Robustness in GNeRFs: The IL2-NeRF Attack
* Advancing Generalizable Tumor Segmentation with Anomaly-Aware Open-Vocabulary Attention Maps and Frozen Foundation Diffusion Models
* Advancing Grapevine Disease Detection Through Airborne Imaging: A Pilot Study in Emilia-Romagna (Italy)
* Advancing Intrusion Detection in V2X Networks: A Comprehensive Survey on Machine Learning, Federated Learning, and Edge AI for V2X Security
* Advancing Manga Analysis: Comprehensive Segmentation Annotations for the Manga109 Dataset
* Advancing Multiple Instance Learning with Continual Learning for Whole Slide Imaging
* Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training
* Advancing Self-Supervised Learning for Building Change Detection and Damage Assessment: Unified Denoising Autoencoder and Contrastive Learning Framework
* Advancing Semantic Future Prediction through Multimodal Visual Sequence Transformers
* Adventurer: Optimizing Vision Mamba Architecture Designs for Efficiency
* Adversarial Diffusion Compression for Real-World Image Super-Resolution
* Adversarial Domain Prompt Tuning and Generation for Single Domain Generalization
* Aerial path planning for 3D urban scene reconstruction with dual-task reconstructability learning and adaptive viewpoints selection
* Aerial-Ground Cross-View Vehicle Re-Identification: A Benchmark Dataset and Baseline
* AerialMegaDepth: Learning Aerial-Ground Reconstruction and View Synthesis
* AeroGen: Enhancing Remote Sensing Object Detection with Diffusion-Driven Data Generation
* AeSPa : Attention-guided Self-supervised Parallel imaging for MRI Reconstruction
* Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization
* AesthetiQ: Enhancing Graphic Layout Design via Aesthetic-Aware Preference Alignment of Multi-modal Large Language Models
* AffordDP: Generalizable Diffusion Policy with Transferable Affordance
* AFL: A Single-Round Analytic Approach for Federated Learning with Pre-trained Models
* AFNE-Net: Semantic Segmentation of Remote Sensing Images via Attention-Based Feature Fusion and Neighborhood Feature Enhancement
* AG-VPReID: A Challenging Large-Scale Benchmark for Aerial-Ground Video-based Person Re-Identification
* Agronomic Information Extraction from UAV-Based Thermal Photogrammetry Using MATLAB
* AI-Face: A Million-Scale Demographically Annotated AI-Generated Face Dataset and Fairness Benchmark
* AI-Powered LiDAR Point Cloud Understanding and Processing: An Updated Survey
* AIDAS: AI-Enhanced Intrusion Detection and Authentication for Autonomous Vehicles
* AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM
* AIM-Fair: Advancing Algorithmic Fairness via Selectively Fine-Tuning Biased Models with Contextual Synthetic Data
* AIpparel: A Multimodal Foundation Model for Digital Garments
* AirRoom: Objects Matter in Room Reidentification
* AKiRa: Augmentation Kit on Rays for optical video generation
* Alias-Free Latent Diffusion Models: Improving Fractional Shift Equivariance of Diffusion Latent Space
* ALIEN: Implicit Neural Representations for Human Motion Prediction under Arbitrary Latency
* Align-A-Video: Deterministic Reward Tuning of Image Diffusion Models for Consistent Video Editing
* Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Large Model Enhancement
* Align3R: Aligned Monocular Depth Estimation for Dynamic Videos
* AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-Modal Alignment
* Alignment, Mining and Fusion: Representation Alignment with Hard Negative Mining and Selective Knowledge Fusion for Medical Visual Question Answering
* All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
* All Weather Radar Image Enhancement and Semantic Segmentation Method for Autonomous Vehicles
* All-Day Multi-Camera Multi-Target Tracking
* All-directional Disparity Estimation for Real-world QPD Images
* All-Optical Nonlinear Diffractive Deep Network for Ultrafast Image Denoising
* All-Weather Precipitable Water Vapor Retrieval over Land Using Integrated Near-Infrared and Microwave Satellite Observations
* AlphaPre: Amplitude-Phase Disentanglement Model for Precipitation Nowcasting
* AMO Sampler: Enhancing Text Rendering with Overshooting
* Amplitude-Modulated Singular Value Decomposition for Ultrafast Ultrasound Imaging of Gas Vesicles
* AMR-Transformer: Enabling Efficient Long-range Interaction for Complex Neural Fluid Simulation
* Analysis of Droughts and Floods Evolution and Teleconnection Factors in the Yangtze River Basin Based on GRACE/GFO
* Analysis of LULC and Urban Thermal Variations in Industrial Cities Using Earth Observation Indices and Machine Learning: A Case Study of Gujranwala, Pakistan
* Analysis of the Dynamic Process of Tornado Formation on 28 July 2024
* Analyzing the Implicit Bias of Adversarial Training From a Generalized Margin Perspective
* Analyzing the Synthetic-to-Real Domain Gap in 3D Hand Pose Estimation
* Anatomical Consistency and Adaptive Prior-informed Transformation for Multi-contrast MR Image Synthesis via Diffusion Model
* Anchor-Aware Similarity Cohesion in Target Frames Enables Predicting Temporal Moment Boundaries in 2D
* Angular Super-Resolution of Forward-Looking Scanning Radar via Grid-Updating Split SPICE-TV
* AniDoc: Animation Creation Made Easier
* AniDoc: Animation Creation Made Easier
* AniGrad: Anisotropic Gradient-Adaptive Sampling for 3D Reconstruction From Monocular Video
* AniGS: Animatable Gaussian Avatar from a Single Image with Inconsistent Gaussian Reconstruction
* Animal-CLIP: A Dual-Prompt Enhanced Vision-Language Model for Animal Action Recognition
* Animate and Sound an Image
* AnimateAnything: Consistent and Controllable Animation for Video Generation
* AniMer: Animal Pose and Shape Estimation Using Family Aware Transformer
* AniMo: Species-Aware Model for Text-Driven Animal Motion Generation
* ANNEXE: Unified Analyzing, Answering, and Pixel Grounding for Egocentric Interaction
* Annotation Ambiguity Aware Semi-Supervised Medical Image Segmentation
* Anomaly Detection in Medical Images Using Encoder-Attention-2Decoders Reconstruction
* AnomalyNCD: Towards Novel Anomaly Class Discovery in Industrial Scenarios
* Anomize: Better Open Vocabulary Video Anomaly Detection
* Anthropogenic Activities Dominate Vegetation Improvement in Arid Areas of China
* Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception
* Any-Resolution AI-Generated Image Detection by Spectral Learning
* Any3DIS: Class-Agnostic 3D Instance Segmentation by 2D Mask Tracking
* Any6D: Model-free 6D Pose Estimation of Novel Objects
* Anyattack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models
* AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos
* AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models
* AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea
* AnyMap: Learning a General Camera Model for Structure-from-Motion with Unknown Distortion in Dynamic Scenes
* AnyMoLe: Any Character Motion In-betweening Leveraging Video Diffusion Models
* AnySat: One Earth Observation Model for Many Resolutions, Scales, and Modalities
* APHQ-ViT: Post-Training Quantization with Average Perturbation Hessian Based Reconstruction for Vision Transformers
* Apollo: An Exploration of Video Understanding in Large Multimodal Models
* Application and Comparison of Satellite-Derived Sea Surface Temperature Gradients to Identify Seasonal and Interannual Variability off the California Coast: Preliminary Results and Future Perspectives
* Application of a Bicubic Quasi-Uniform B-Spline Surface Fitting Method for Characterizing Mesoscale Eddies in the Atlantic Ocean
* Application of Landsat High Spatial Resolution Phenological Synthesized Data in Mountainous Land Cover Classification
* Application of Uncertainty to Out-of-Distribution Detection for Autonomous Driving Perception Safety
* Apply Hierarchical-Chain-of-Generation to Complex Attributes Text-to-3D Generation
* Applying Deep Learning Methods for a Large-Scale Riparian Vegetation Classification from High-Resolution Multimodal Aerial Remote Sensing Data
* APT: Adaptive Personalized Training for Diffusion Models with Limited Data
* AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion
* Arbitrary Mode-3 Dimensional Tensor-Tensor Product for Tensor Train Decomposition From Interaction Perspective, An
* Arbitrary-steps Image Super-resolution via Diffusion Inversion
* Arc2Avatar: Generating Expressive 3D Avatars from a Single Image via ID Guidance
* ArcPro: Architectural Programs for Structured 3D Abstraction of Sparse Points
* Are Images Indistinguishable to Humans Also Indistinguishable to Classifiers?
* Are Spatial-Temporal Graph Convolution Networks for Human Action Recognition Over-Parameterized?
* Argus: A Compact and Versatile Foundation Model for Vision
* Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought
* ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding
* ARM: Appearance Reconstruction Model for Relightable 3D Generation
* Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation
* Art of Deception: Color Visual Illusions and Diffusion Models, The
* ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation
* ArtFormer: Controllable Generation of Diverse 3D Articulated Objects
* Articulated Kinematics Distillation from Video Diffusion Models
* ArticulatedGS: Self-supervised Digital Twin Modeling of Articulated Objects using 3D Gaussian Splatting
* ArtiFade: Learning to Generate High-quality Subject from Blemished Images
* Artificial Surface Water Construction Aggregated Water Loss Through Evaporation in the North China Plain
* ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding
* ASHiTA: Automatic Scene-Grounded HIerarchical Task Analysis
* ASIGN: An Anatomy-aware Spatial Imputation Graphic Network for 3D Spatial Transcriptomics
* Assessing and Learning Alignment of Unimodal Vision and Language Models
* Assessing Mangrove Forest Recovery in the British Virgin Islands After Hurricanes Irma and Maria with Sentinel-2 Imagery and Google Earth Engine
* Assessing Model Trade-Offs in Agricultural Remote Sensing: A Review of Machine Learning and Deep Learning Approaches Using Almond Crop Mapping
* Assessing the Potential of PlanetScope Imagery for Iron Oxide Detection in Antimony Exploration
* Assessment of Aerosol Optical Depth, Cloud Fraction, and Liquid Water Path in CMIP6 Models Using Satellite Observations
* Associate Everything Detected: Facilitating Tracking-by-Detection to the Unknown
* Associative Transformer
* Asynchronous Collaborative Graph Representation for Frames and Events
* ATA: Adaptive Transformation Agent for Text-Guided Subject-Position Variable Background Inpainting
* AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward
* ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models
* ATP: Adaptive Threshold Pruning for Efficient Data Encoding in Quantum Neural Networks
* Attend to Not Attended: Structure-then-Detail Token Merging for Post-training DiT Acceleration
* Attention Distillation: A Unified Approach to Visual Characteristics Transfer
* Attention IoU: Examining Biases in CelebA using Attention Maps
* Attraction Diminishing and Distributing for Few-Shot Class-Incremental Learning
* Attribute-Based Pre-Authenticated Secure Communication Protocol Enabling Key Protection and Credential Online-Upgrading for 5G NR V2X, An
* Attribute-formed Class-specific Concept Space: Endowing Language Bottleneck Model with Better Interpretability and Scalability
* Attribute-Missing Multi-view Graph Clustering
* AU-Net: Adaptive Unified Network for Joint Multi-Modal Image Registration and Fusion
* AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers
* Audio-Visual Instance Segmentation
* Audio-Visual Semantic Graph Network for Audio-Visual Event Localization
* Augmented Deep Contexts for Spatially Embedded Video Coding
* Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering
* Augmenting Perceptual Super-Resolution via Image Quality Predictors
* AuraFusion360: Augmented Unseen Region Alignment for Reference-based 360° Unbounded Scene Inpainting
* Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language
* Auto-Encoded Supervision for Perceptual Image Super-Resolution
* Auto-Prompting SAM for Container Detection and Localization in Container Yards
* AutoLUT: LUT-Based Image Super-Resolution with Automatic Sampling and Adaptive Residual Learning
* Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation
* Automated Proof of Polynomial Inequalities via Reinforcement Learning
* Automatic Joint Structured Pruning and Quantization for Efficient Neural Network Training and Compression
* Automatic Spectral Calibration of Hyperspectral Images: Method, Dataset and Benchmark
* Autonomous Vehicle Overtaking Trajectory Based on Cubic Bézier Spirals: Analysis Under Multiple Physical Constraints
* AutoPresent: Designing Structured Visuals from Scratch
* Autoregressive Distillation of Diffusion Transformers
* Autoregressive Sequential Pretraining for Visual Tracking
* Autoregressive Temporal Modeling for Advanced Tracking-by-Diffusion
* AutoSSVH: Exploring Automated Frame Sampling for Efficient Self-Supervised Video Hashing
* AutoURDF: Unsupervised Robot Modeling from Point Cloud Frames Using Cluster Registration
* AvatarArtist: Open-Domain 4D Avatarization
* AvatarStudio: High-Fidelity and Animatable 3D Avatar Creation from Text
* AVF-MAE++: Scaling Affective Video Facial Masked Autoencoders via Efficient Audio-Visual Self-Supervised Learning
* AVLTrack: Dynamic Sparse Learning for Aerial Vision-Language Tracking
* AVQACL: A Novel Benchmark for Audio-Visual Question Answering Continual Learning
* A^2M^2-Net: Adaptively Aligned Multi-Scale Moment for Few-Shot Action Recognition
* Backdoor for Debias: Mitigating Model Bias With Backdoor Attack-Based Artificial Bias
* BackdoorBench: A Comprehensive Benchmark and Analysis of Backdoor Learning
* BACON: Improving Clarity of Image Captions via Bag-of-Concept Graphs
* BADGR: Bundle Adjustment Diffusion Conditioned by GRadients for Wide-Baseline Floor Plan Reconstruction
* BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models
* Balanced Direction from Multifarious Choices: Arithmetic Meta-Learning for Domain Generalization
* Balanced Rate-Distortion Optimization in Learned Image Compression
* Balancing Two Classifiers via A Simplex ETF Structure for Model Calibration
* Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine Synergy
* BARD-GS: Blur-Aware Reconstruction of Dynamic Scenes via Gaussian Splatting
* Basket: A Large-Scale Video Dataset for Fine-Grained Skill Estimation
* Bayesian Procedures for Modeling Truck Route Choices
* Bayesian Prompt Flow Learning for Zero-Shot Anomaly Detection
* Bayesian Test-Time Adaptation for Vision-Language Models
* BCTDNet: Building Change-Type Detection Networks with the Segment Anything Model in Remote Sensing Images
* Be More Specific: Evaluating Object-centric Realism in Synthetic Images
* Believing is Seeing: Unobserved Object Detection using Generative Models
* Benchmarking and Analyzing Generative Data for Visual Recognition
* Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning
* Benchmarking Object Detectors under Real-World Distribution Shifts in Satellite Imagery
* beta-FFT: Nonlinear Interpolation and Differentiated Training Strategies for Semi-Supervised Medical Image Segmentation
* BEVDiffuser: Plug-and-Play Diffusion Model for BEV Denoising with Ground-Truth Guidance
* Beyond Algorithm Updates: A Systematic Validation of GPM DPR-V07 over China's Multiscale Topography
* Beyond Background Shift: Rethinking Instance Replay in Continual Semantic Segmentation
* Beyond Clean Training Data: A Versatile and Model-Agnostic Framework for Out-of-Distribution Detection with Contaminated Training Data
* Beyond Generation: A Diffusion-based Low-level Feature Extractor for Detecting AI-generated Images
* Beyond Human Perception: Understanding Multi-Object World from Monocular View
* Beyond Image Classification: A Video Benchmark and Dual-Branch Hybrid Discrimination Framework for Compositional Zero-Shot Learning
* Beyond Local Sharpness: Communication-Efficient Global Sharpness-aware Minimization for Federated Learning
* Beyond Sight: Towards Cognitive Alignment in LVLM via Enriched Visual Knowledge
* Beyond Single-Modal Boundary: Cross-Modal Anomaly Detection through Visual Prototype and Harmonization
* Beyond the Backbone: A Quantitative Review of Deep-Learning Architectures for Tropical Cyclone Track Forecasting
* Beyond the Grid: GLRT-Based TomoSAR Fast Detection for Retrieving Height and Thermal Dilation
* Beyond Words: Augmenting Discriminative Richness via Diffusions in Unsupervised Prompt Learning
* BF-STVSR: B-Splines and Fourier: Best Friends for High Fidelity Spatial-Temporal Video Super-Resolution
* BFANet: Revisiting 3D Semantic Segmentation with Boundary Feature Analysis
* BG-Triangle: Bézier Gaussian Triangle for 3D Vectorization and Rendering
* BHViT: Binarized Hybrid Vision Transformer
* Bias for Action: Video Implicit Neural Representations with Bias Modulation
* Bias-Free Training Paradigm for More General AI-generated Image Detection, A
* Bidirectional Cross-Scrambling Medical Image Encryption Scheme Incorporates Compressed Sensing and Its Application in IoMT, A
* BIGS: Bimanual Category-agnostic Interaction Reconstruction from Monocular Videos via 3D Gaussian Splatting
* BiLoRA: Almost-orthogonal Parameter Spaces for Continual Learning
* BiM-VFI: Bidirectional Motion Field-Guided Frame Interpolation for Video with Non-uniform Motions
* BimArt: A Unified Approach for the Synthesis of 3D Bimanual Interaction with Articulated Objects
* BIMBA: Selective-Scan Compression for Long-Range Video Question Answering
* Binarized Mamba-Transformer for Lightweight Quad Bayer HybridEVS Demosaicing
* Binarized Neural Network for Multi-spectral Image Fusion
* Biomass Estimation of Apple and Citrus Trees Using Terrestrial Laser Scanning and Drone-Mounted RGB Sensor
* BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models
* BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature
* BioX-CPath: Biologically-driven Explainable Diagnostics for Multistain IHC Computational Pathology
* BIP3D: Bridging 2D Images and 3D Perception for Embodied Intelligence
* Birth and Death of a Rose
* BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation
* Black Hole-Driven Identity Absorbing in Diffusion Models
* Black Swan: Abductive and Defeasible Video Reasoning in Unpredictable Events
* Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models
* BlackboxBench: A Comprehensive Benchmark of Black-Box Adversarial Attacks
* BLADE: Single-View Body Mesh Estimation through Accurate Depth Estimation
* BlenderGym: Benchmarking Foundational Model Systems for Graphics Editing
* Blind Bitstream-corrupted Video Recovery via Metadata-guided Diffusion Model
* Blind Image Quality Assessment by Gaussian Mixture Distribution
* BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations
* BlockDance: Reuse Structurally Similar Spatio-Temporal Features to Accelerate Diffusion Transformers
* Blood Flow Speed Estimation with Optical Coherence Tomography Angiography Images
* BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices
* Blurred LiDAR for Sharper 3D: Robust Handheld 3D Scanning with Diffuse LiDAR and RGB
* Blurry-Edges: Photon-Limited Depth Estimation from Defocused Boundaries
* BOE-ViT: Boosting Orientation Estimation with Equivariance in Self-Supervised 3D Subtomogram Alignment
* BOLT: Boost Large Vision-Language Model Without Training for Long-Form Video Understanding
* Boltzmann Attention Sampling for Image Analysis with Small Objects
* Boost the Inference with Co-Training: A Depth-Guided Mutual Learning Framework for Semi-Supervised Medical Polyp Segmentation
* Boost Your Human Image Generation Model via Direct Preference Optimization
* Boosting Adversarial Transferability through Augmentation in Hypothesis Space
* Boosting Domain Incremental Learning: Selecting the Optimal Parameters is All You Need
* Boosting Point-Supervised Temporal Action Localization through Integrating Query Reformation and Optimal Transport
* Boosting the Dual-Stream Architecture in Ultra-High Resolution Segmentation with Resolution-Biased Uncertainty Estimation
* BootPlace: Bootstrapped Object Placement with Detection Transformers
* Bootstrap Your Own Views: Masked Ego-Exo Modeling for Fine-grained View-invariant Video Representations
* BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data Training
* Bounded-Function-Based Schemes for Finite-Time Control of a NWMR With Input Constraints
* Brain-Inspired Spiking Neural Networks for Energy-Efficient Object Detection
* Breaking the Low-Rank Dilemma of Linear Attention
* Breaking the Memory Barrier of Contrastive Loss via Tile-Based Strategy
* BrepGiff: Lightweight Generation of Complex B-rep with 3D GAT Diffusion
* Bridge Frame and Event: Common Spatiotemporal Fusion for High-Dynamic Scene Optical Flow
* Bridge the Gap: From Weak to Full Supervision for Temporal Action Localization with PseudoFormer
* Bridging Dimensions in Fingerprints to Advance Distinctiveness: Recovering 3D Minutiae From a Single Contactless 2D Fingerprint Image
* Bridging Gait Recognition and Large Language Models Sequence Modeling
* Bridging Modalities: Improving Universal Multimodal Retrieval by Multimodal Large Language Models
* Bridging Past and Future: End-to-End Autonomous Driving with Historical Prediction and Planning
* Bridging the Gap Between Active Faulting and Deformation Across Normal-Fault Systems in the Central-Southern Apennines (Italy): Multi-Scale and Multi-Source Data Analysis
* Bridging the Gap between Gaussian Diffusion Models and Universal Quantization for Image Compression
* Bridging the Vision-Brain Gap with an Uncertainty-Aware Blur Prior
* Bridging Viewpoint Gaps: Geometric Reasoning Boosts Semantic Correspondence
* Bringing CLIP to the Clinic: Dynamic Soft Labels and Negation-Aware Learning for Medical Analysis
* Broadcast Authentication: An Efficient Identity Authentication Protocol With Provable Security for VANETs
* BRTAL: Boundary Refinement Temporal Action Localization via Offset-Driven Diffusion Models
* Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors
* Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs
* Building Vision Models upon Heat Conduction
* Bundle adjustment for multi-source Mars orbiter imagery with generalized control constraints
* BWFormer: Building Wireframe Reconstruction from Airborne LiDAR Point Cloud with Transformer
* ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way
* C2RF: Bridging Multi-Modal Image Registration and Fusion via Commonality Mining and Contrastive Learning
* CacheQuant: Comprehensively Accelerated Diffusion Models
* CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation
* CADCrafter: Generating Computer-Aided Design Models from Unconstrained Images
* CADDreamer: CAD Object Generation from Single-view Images
* CADRef: Robust Out-of-Distribution Detection via Class-Aware Decoupled Relative Feature Leveraging
* Calibrated Multi-Preference Optimization for Aligning Diffusion Models
* Calico: Part-Focused Semantic Co-Segmentation with Large Vision-Language Models
* Camera resection from known line pencils and a radially distorted scanline
* Camera-Proxy Enhanced Identity-Recalibration Learning for Unsupervised Visible-Infrared Person Re-Identification
* CamFreeDiff: Camera-free Image to Panorama Generation with Diffusion Model
* Camouflage Anything: Learning to Hide using Controlled Out-painting and Representation Engineering
* CamPoint: Boosting Point Cloud Segmentation with Virtual Camera
* CaMuViD: Calibration-Free Multi-View Detection
* Can Generative Video Models Help Pose Estimation?
* Can Large Vision-Language Models Correct Semantic Grounding Errors By Themselves?
* Can Machines Understand Composition? Dataset and Benchmark for Photographic Image Composition Embedding and Understanding
* Can Text-to-Video Generation help Video-Language Alignment?
* Can't Slow Me Down: Learning Robust and Hardware-Adaptive Object Detectors against Latency Attacks for Edge Devices
* CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image
* CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View Diffusion Models
* CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction
* CaricatureBooth: Data-Free Interactive Caricature Generation in a Photo Booth
* CARL: A Framework for Equivariant Image Registration
* CarPlanner: Consistent Auto-regressive Trajectory Planning for Large-scale Reinforcement Learning in Autonomous Driving
* CasaGPT: Cuboid Arrangement and Scene Assembly for Interior Design
* Cascaded Detection Method for Ship Targets Using High-Frequency Surface Wave Radar in the Time-Frequency Domain
* Cascaded Dynamic Memory Refinement and Semantic Alignment for Exo-to-Ego Cross-View Video Generation
* Case Study on the Vertical Distribution and Correlation Between Low-Frequency Lightning Sources and Hydrometeors During a Thunderstorm, A
* CASP: Compression of Large Multimodal Models Based on Attention Sparsity
* CASP: Consistency-aware Audio-induced Saliency Prediction Model for Omnidirectional Video
* CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models
* CATANet: Efficient Content-Aware Token Aggregation for Lightweight Image Super-Resolution
* Category-Agnostic Neural Object Rigging
* CAU2DNet: A Dual-Branch Deep Learning Network and a Dataset for Slum Recognition with Multi-Source Remote Sensing Data
* Causal Composition Diffusion Model for Closed-loop Traffic Generation
* Causal Confusion in Pedestrian Crossing Intention Prediction for Autonomous Vehicles: The Role of Ego-Vehicle Speed
* CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment
* CCDR: Combining Channel-Wise Convolutional Local Perception, Detachable Self-Attention, and a Residual Feedforward Network for PolSAR Image Classification
* CCIN: Compositional Conflict Identification and Neutralization for Composed Image Retrieval
* CDI: Copyrighted Data Identification in Diffusion Models
* Cech Complex Generation With Homotopy Equivalence Framework for Myocardial Infarction Diagnosis Using Electrocardiogram Signals
* Certified Human Trajectory Prediction
* CFRANet: Cross-Modal Frequency-Responsive Attention Network for Thermal Power Plant Detection in Multispectral High-Resolution Remote Sensing Images
* CGMatch: A Different Perspective of Semi-supervised Learning
* CH3Depth: Efficient and Flexible Depth Foundation Model with Flow Matching
* Chain of Attack: On the Robustness of Vision-Language Models Against Transfer-Based Adversarial Attacks
* Chain of Semantics Programming in 3D Gaussian Splatting Representation for 3D Vision Grounding
* ChainHOI: Joint-based Kinematic Chain Modeling for Human-Object Interaction Generation
* Change You Want To Detect: Semantic Change Detection In Earth Observation With Hybrid Data Generation, The
* Change3D: Revisiting Change Detection and Captioning from A Video Modeling Perspective
* Channel Amplitude and Phase Error Estimation of Fully Polarimetric Airborne SAR with 0.1 m Resolution
* Channel Consistency Prior and Self-Reconstruction Strategy Based Unsupervised Image Deraining
* Channel-wise Noise Scheduled Diffusion for Inverse Rendering in Indoor Scenes
* Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs
* Characterization of Fresh and Aged Smoke Particles Simultaneously Observed with an ACTRIS Multi-Wavelength Raman Lidar in Potenza, Italy
* Charm: The Missing Piece in ViT fine-tuning for Image Aesthetic Assessment
* Chat-based Person Retrieval via Dialogue-Refined Cross-Modal Alignment
* Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models
* ChatGarment: Garment Estimation, Generation and Editing via Large Language Models
* ChatGen: Automatic Text-to-Image Generation From FreeStyle Chatting
* ChatHuman: Chatting about 3D Humans with Tools
* Cheb-GR: Rethinking k-nearest neighbor search in Re-ranking for Person Re-identification
* Chebyshev Attention Depth Permutation Texture Network with Latent Texture Attribute Loss
* CheckManual: A New Challenge and Benchmark for Manual-based Appliance Manipulation
* CheXwhatsApp: A Dataset for Exploring Challenges in the Diagnosis of Chest X-rays through Mobile Devices
* CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning
* CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools
* Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learning
* CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos
* CL-LoRA: Continual Low-Rank Adaptation for Rehearsal-Free Class-Incremental Learning
* CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering
* Class-Discrepancy Dynamic Weighting for Cross-Domain Few-Shot Hyperspectral Image Classification
* Classic Video Denoising in a Machine Learning World: Robust, Fast, and Controllable
* Classification of Precipitation Types and Investigation of Their Physical Characteristics Using Three-Dimensional S-Band Dual-Polarization Radar Data
* Classifier-Free Guidance inside the Attraction Basin May Cause Memorization
* Classifier-guided CLIP Distillation for Unsupervised Multi-label Classification
* Classifier-to-Bias: Toward Unsupervised Automatic Bias Detection for Visual Classifiers
* CleanDIFT: Diffusion Features without Noise
* ClearSight: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large Language Models
* Client-Unbiased Skeletal Action Recognizer in Federated Learning
* Climate Warming-Driven Expansion and Retreat of Alpine Scree in the Third Pole over the Past 45 Years
* ClimbingCap: Multi-Modal Dataset and Method for Rock Climbing in World Coordinate
* CLIMS++: Cross Language Image Matching with Automatic Context Discovery for Weakly Supervised Semantic Segmentation
* CLIP is Almost All You Need: Towards Parameter-Efficient Scene Text Retrieval without OCR
* CLIP is Strong Enough to Fight Back: Test-time Counterattacks towards Zero-shot Adversarial Robustness of CLIP
* CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object Representation
* CLIP-driven Coarse-to-fine Semantic Guidance for Fine-grained Open-set Semi-supervised Learning
* CLOC: Contrastive Learning for Ordinal Classification with Multi-Margin N-pair Loss
* Closed-Loop Supervised Fine-Tuning of Tokenized Traffic Models
* Closer Look at Benchmarking Self-supervised Pre-training with Image Classification, A
* Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training, A
* Closest Neighbors are Harmful for Lightweight Masked Auto-encoders
* Cloud Vertical Structure Optimization Algorithm Combining FY-4A and DSCOVR Satellite Data, A
* Cloud-based satellite remote sensing for enhancing seagrass monitoring and ecosystem management
* Clustered Rainfall-Induced Landslides in Jiangwan Town, Guangdong, China During April 2024: Characteristics and Controlling Factors
* CMMLoc: Advancing Text-to-PointCloud Localization with Cauchy-Mixture-Model Based Framework
* Co-op: Correspondence-based Novel Object Pose Estimation
* Co-Pseudo Labeling and Active Selection for Fundus Single-Positive Multi-Label Learning
* Co-Speech Gesture Video Generation with Implicit Motion-Audio Entanglement
* CO-SPY: Combining Semantic and Pixel Features to Detect Synthetic Images by AI
* CoA: Towards Real Image Dehazing via Compression-and-Adaptation
* COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection
* Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model
* Coastal Eddy Detection in the Balearic Sea: SWOT Capabilities
* COB-GS: Clear Object Boundaries in 3DGS Segmentation Based on Boundary-Adaptive Gaussian Splitting
* COBRA: COmBinatorial Retrieval Augmentation for Few-Shot Adaptation
* CocoER: Aligning Multi-Level Feature by Competition and Coordination for Emotion Recognition
* CoCoGaussian: Leveraging Circle of Confusion for Gaussian Splatting from Defocused Images
* Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection
* CoE: Chain-of-Explanation via Automatic Visual Concept Circuit Description and Polysemanticity Quantification
* Coeff-Tuning: A Graph Filter Subspace View for Tuning Attention-Based Large Models
* Cognitive Contour Detection of Sparse-Structured Objects in the Alpha-Shape Scale Space
* Coherent 3D Portrait Video Reconstruction via Triplane Fusion
* CoIR: Compressive Implicit Radar
* ColabSfM: Collaborative Structure-from-Motion by Point Cloud Registration
* Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient
* Collaborative Tree Search for Enhancing Embodied Multi-Agent Collaboration
* CoLLM: A Large Language Model for Composed Image Retrieval
* Color Alignment in Diffusion
* CoMapGS: Covisibility Map-Based Gaussian Splatting for Sparse Novel View Synthesis
* CoMatcher: Multi-View Collaborative Feature Matching
* Combined Raman Lidar and Ka-Band Radar Aerosol Observations
* Combining Autoregressive and Non-Autoregressive Models for Ship License Plate Recognition
* Combining TanDEM-X Interferometry and GEDI Space LiDAR for Estimation of Forest Biomass Change in Tanzania
* CoMBO: Conflict Mitigation via Branched Optimization for Class Incremental Segmentation
* ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems
* CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation
* Common3D: Self-Supervised Learning of 3D Morphable Models for Common Objects in Neural Feature Space
* Commonsense Video Question Answering through Video-Grounded Entailment Tree Reasoning
* Communication Strategy on Macro-and-Micro Traffic State in Cooperative Deep Reinforcement Learning for Regional Traffic Signal Control
* Community Forensics: Using Thousands of Generators to Train Fake Image Detectors
* Comparative Global Assessment and Optimization of LandTrendr, CCDC, and BFAST Algorithms for Enhanced Urban Land Cover Change Detection Using Landsat Time Series
* Comparison of GOES16 Data with the TRACER-ESCAPE Field Campaign Dataset for Convection Characterization: A Selection of Case Studies and Lessons Learnt
* Comparison of NeRF- and SfM-Based Methods for Point Cloud Reconstruction for Small-Sized Archaeological Artifacts
* Compass Control: Multi Object Orientation Control for Text-to-Image Generation
* Competitive Analysis of Vehicle-Sharing Systems With Cournot Queueing Games
* COMPGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians
* Complementary Advantages: Exploiting Cross-Field Frequency Correlation for NIR-Assisted Image Denoising
* Completion as Enhancement: A Degradation-Aware Selective Image Guided Network for Depth Completion
* Complexity Experts are Task-Discriminative Learners for Any Image Restoration
* Composing Parts for Expressive Object Generation
* Compositional Caching for Training-free Open-vocabulary Attribute Detection
* Compositional Physical Reasoning of Objects and Events From Videos
* Compositional Targeted Multi-Label Universal Perturbations
* Comprehensive Information Bottleneck for Unveiling Universal Attribution to Interpret Vision Transformers
* Comprehensive Relighting: Generalizable and Consistent Monocular Human Relighting and Harmonization
* Comprehensive Review of Mathematical Error Characterization and Mitigation Strategies in Terrestrial Laser Scanning, A
* Comprehensive Study of Decoder-Only LLMs for Text-to-Image Generation, A
* Comprehensive Survey of the New Generation Pavement Structural Condition Assessment in Pavement Management System: Traffic Speed Deflection Device, A
* Computational Imaging for Long-Term Prediction of Solar Irradiance
* Computationally Inexpensive Decentralized Adaptive Asymptotic Tracking Control for a Single Under-Actuated High-Speed Train
* ComRoPE: Scalable and Robust Rotary Position Embedding Parameterized by Trainable Commuting Angle Matrices
* Concept Lancet: Image Editing with Compositional Representation Transplant
* Concept Replacer: Replacing Sensitive Concepts in Diffusion Models via Precision Localization
* ConceptGuard: Continual Personalized Text-to-Image Generation with Forgetting and Confusion Mitigation
* Condensing Action Segmentation Datasets via Generative Network Inversion
* Conditional Balance: Improving Multi-Conditioning Trade-Offs in Image Generation
* Conditional Time Series Diffusion Model for High-Speed Train Multi-Sensor Signals Imputation
* Conformal Prediction and MLLM aided Uncertainty Quantification in Scene Graph Generation
* Conformal Prediction for Zero-Shot Models
* Conical Visual Concentration for Efficient Large Vision-Language Models
* ConMo: Controllable Motion Disentanglement and Recomposition for Zero-Shot Motion Transfer
* Consistency Posterior Sampling for Diverse Image Synthesis
* Consistency-aware Self-Training for Iterative-based Stereo Matching
* Consistent and Controllable Image Animation with Motion Diffusion Models
* Consistent Normal Orientation for 3D Point Clouds via Least Squares on Delaunay Graph
* Consistent Prompt Tuning for Generalized Category Discovery
* Constructing an Ecological Spatial Network Optimization Framework from the Pattern-Process-Function Perspective: A Case Study in Wuhan
* Construction and Application of Carbon Emissions Estimation Model for China Based on Gradient Boosting Algorithm
* Content-Decoupled Contrastive Learning-Based Implicit Degradation Modeling for Blind Image Super-Resolution
* Context-Aware Multimodal Pretraining
* ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval
* Context-Enhanced Memory-Refined Transformer for Online Action Detection
* Contextual AD Narration with Interleaved Multimodal Sequence
* Continual SFT Matches Multimodal RLHF with Negative Supervision
* Continuous 3D Perception Model with Persistent State
* Continuous Adverse Weather Removal via Degradation-Aware Distillation
* Continuous Locomotive Crowd Behavior Generation
* Continuous Space-Time Video Resampling with Invertible Motion Steganography
* Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions
* Contour-Aware Multi-Expert Model for Ambiguous Medical Image Segmentation
* Contrastive Learning via Variational Information Bottleneck
* Contrastive Learning-Based Hyperspectral Image Target Detection Using a Gated Dual-Path Network
* Contrastive MLP Network Based on Adjacent Coordinates for Cross-Domain Zero-Shot Hyperspectral Image Classification
* Contributions of Dust and Non-Dust Weather to Dust Emissions: A Case Study from the Central Taklimakan Desert
* ControlFace: Harnessing Facial Parametric Control for Face Rigging
* Controllable Human Image Generation with Personalized Multi-Garments
* Convective-Stratiform Identification Neural Network (CONSTRAINN) for the WIVERN Mission
* Convex Combination Star Shape Prior for Data-driven Image Semantic Segmentation
* Convex Relaxation for Robust Vanishing Point Estimation in Manhattan World
* CoPISan: Contrastive Perceptual Inference and Sanity Checks for Concept-Based CNN Explanations
* CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement
* CorrBEV: Multi-View 3D Object Detection by Correlation Learning with Multi-modal Prototypes
* Correcting Deviations from Normality: A Reformulated Diffusion Model for Multi-Class Unsupervised Anomaly Detection
* Correction of ASCAT, ESA-CCI, and SMAP Soil Moisture Products Using the Multi-Source Long Short-Term Memory (MLSTM)
* Correlative and Discriminative Label Grouping for Multi-Label Visual Prompt Tuning
* CoSDH: Communication-Efficient Collaborative Perception via Supply-Demand Awareness and Intermediate-Late Hybridization
* CoSER: Towards Consistent Dense Multiview Text-To-Image Generator for 3D Creation
* COSMIC: Clique-Oriented Semantic Multi-space Integration for Robust CLIP Test-Time Adaptation
* COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
* CoSpace: Benchmarking Continuous Space Perception Ability for Vision-Language Models
* Cost-Sensitive Small Vessel Detection Method for Maritime Remote Sensing Imagery, A
* CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models
* Count-Free Single-Photon 3D Imaging With Race Logic
* CountLLM: Towards Generalizable Repetitive Action Counting via Large Language Model
* COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models under Distribution Shifts
* Coverless Image Steganography Based on Semantic-Controlled Text-to-Image Generation
* CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology
* Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation
* CraftsMan3D: High-fidelity Mesh Generation with 3D Native Diffusion and Interactive Geometry Refiner
* Creating Your Editable 3D Photorealistic Avatar with Tetrahedron-constrained Gaussian Splatting
* Creatively Upscaling Images with Global-Regional Priors
* CRISP: Object Pose and Shape Estimation with Test-Time Adaptation
* Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
* CroCoDL: Cross-device Collaborative Dataset for Localization
* Crop Evapotranspiration Dynamics in Morocco's Climate-Vulnerable Saiss Plain
* Cropper: Vision-Language Model for Image Cropping through In-Context Learning
* CropSTS: A Remote Sensing Foundation Model for Cropland Classification with Decoupled Spatiotemporal Attention
* Cross-Modal 3D Representation with Multi-View Images and Point Clouds
* Cross-Modal Adaptive Prototype Learning for Continuous Sign Language Recognition
* Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding
* Cross-modal Causal Relation Alignment for Video Question Grounding
* Cross-Modal Distillation for 2D/3D Multi-Object Discovery from 2D motion
* Cross-modal feature fusion for robust point cloud registration with ambiguous geometry
* Cross-Modal Information Flow in Multimodal Large Language Models
* Cross-Modal Interactive Perception Network with Mamba for Lung Tumor Segmentation in PET-CT Images
* Cross-Rejective Open-Set SAR Image Registration
* Cross-Scenario Vigilance Detection Based on EEG Analysis for Safety Driving in Autonomous
* cross-spatiotemporal weakly supervised framework for land cover classification: Generating temporally and spatially consistent land cover maps, A
* Cross-View Completion Models are Zero-shot Correspondence Estimators
* Cross-view geo-localization with panoramic street-view and VHR satellite imagery in decentrality settings
* Cross-view geolocation via segmentation and common region feature matching
* CrossFuse: Learning Infrared and Visible Image Fusion by Cross-Sensor Top-K Vision Alignment and Beyond
* CrossOver: 3D Scene Cross-Modal Alignment
* CrossSDF: 3D Reconstruction of Thin Structures From Cross-Sections
* CryptoFace: End-To-End Encrypted Face Recognition
* CSAN: A Channel-Spatial Attention-Based Network for Meteorological Satellite Image Super-Resolution
* CSC-PA: Cross-image Semantic Correlation via Prototype Attentions for Single-network Semi-supervised Breast Tumor Segmentation
* CSDNet: Context-Aware Segmentation of Disaster Aerial Imagery Using Detection-Guided Features and Lightweight Transformers
* Ctrl-D: Controllable Dynamic 3D Scene Editing with Personalized 2D Diffusion
* CTRL-O: Language-Controllable Object-Centric Visual Representation Learning
* Cubify Anything: Scaling Indoor 3D Object Detection
* Cumulative and Lagged Effects of Drought on the Phenology of Different Vegetation Types in East Asia, 2001-2020
* Current Progress in and Future Visions of Key Technologies of UAV-Borne Multi-Modal Geophysical Exploration for Mineral Exploration: A Scoping Review
* Curriculum Coarse-to-Fine Selection for High-IPC Dataset Distillation
* Curriculum Direct Preference Optimization for Diffusion and Consistency Models
* CustAny: Customizing Anything from A Single Example
* Customized Condition Controllable Generation for Video Soundtrack
* CustomKD: Customizing Large Vision Foundation for Edge Model Improvement via Knowledge Distillation
* CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset
* Cyberattack Warning System for Enhancing Connected Vehicle Safety Under Spoofing Cyberattacks: A Generative-Based Human-in-the-Loop Trajectory Prediction Approach, A
* D2iT: Dynamic Diffusion Transformer for Accurate Image Generation
* D2PNet: Deep Detail Priors for Multispectral Image Fusion
* D2SP: Dynamic Dual-Stage Purification Framework for Dual Noise Mitigation in Vision-Based Affective Recognition
* D3-Human: Dynamic Disentangled Digital Human from Monocular Video
* D3: Scaling Up Deepfake Detection by Learning from Discrepancy
* D3CTTA: Domain-Dependent Decorrelation for Continual Test-Time Adaption of 3D LiDAR Segmentation
* D3T: Dual-Domain Diffusion Transformer in Triplanar Latent Space for 3D Incomplete-View CT Reconstruction
* DA-VPT: Semantic-Guided Visual Prompt Tuning for Vision Transformers
* DaCapo: Score Distillation as Stacked Bridge for Fast and High-quality 3D Editing
* DAGSM: Disentangled Avatar Generation with GS-enhanced Mesh
* DAMM-Diffusion: Learning Divergence-Aware Multi-Modal Diffusion Model for Nanoparticles Distribution Prediction
* DarkIR: Robust Low-Light Image Restoration
* DART: Disease-aware Image-Text Alignment and Self-correcting Re-alignment for Trustworthy Radiology Report Generation
* DashGaussian: Optimizing 3D Gaussian Splatting in 200 Seconds
* Data Augmentation Method for Establishing a Relationship Model Between Composition and Viscoelastic Properties of Asphalt Binder, A
* Data Distributional Properties As Inductive Bias for Systematic Generalization
* Data ID Extraction Networks for Unsupervised Class- and Classifier-Free Detection of Adversarial Examples
* Data Synthesis with Diverse Styles for Face Recognition via 3DMM-Guided Diffusion
* Data-Adaptive Weight-Ensembling for Multi-task Model Fusion
* Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning, A
* Data-Free Group-Wise Fully Quantized Winograd Convolution via Learnable Scales
* Data-free Universal Adversarial Perturbation with Pseudo-semantic Prior
* Dataset Distillation with Neural Characteristic Function: A Minmax Perspective
* Dataset for Semantic Segmentation in the Presence of Unknowns, A
* DCEvo: Discriminative Cross-Dimensional Evolutionary Learning for Infrared and Visible Image Fusion
* DCGSD: Low-Light Image Enhancement With Dual-Conditional Guidance Sparse Diffusion Model
* De2Gaze: Deformable and Decoupled Representation Learning for 3D Gaze Estimation
* DEAL: Data-Efficient Adversarial Learning for High-Quality Infrared Imaging
* Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization
* Debris-Flow Erosion Volume Estimation Using a Single High-Resolution Optical Satellite Image
* DeCafNet: Delegate and Conquer for Efficient Temporal Grounding in Long Videos
* Decentralized Diffusion Models
* Decentralized Nonconvex Low-rank Matrix Recovery
* Decision Making of Automated Vehicles in Mixed Environment Based on Bayesian Sequential Games
* Decision SpikeFormer: Spike-Driven Transformer for Decision Making
* DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception
* DeClotH: Decomposable 3D Cloth and Human Body Reconstruction from a Single Image
* Decoder Gradient Shield: Provable and High-Fidelity Prevention of Gradient-Based Box-Free Watermark Removal
* Decoder-Only Image Registration
* Decomposition and Quantification of SOTIF Requirements for Perception Systems of Autonomous Vehicles
* Decompositional Neural Scene Reconstruction with Generative Diffusion Prior
* Decouple Distortion from Perception: Region Adaptive Diffusion for Extreme-low Bitrate Perception Image Compression
* Decouple-Then-Merge: Finetune Diffusion Models as Multi-Task Learning
* Decoupled Distillation to Erase: A General Unlearning Method for Any Class-centric Tasks
* Decoupled Motion Expression Video Segmentation
* DecoupledGaussian: Object-Scene Decoupling for Physics-Based Interaction
* Decoupling Fine Detail and Global Geometry for Compressed Depth Map Super-Resolution
* Decoupling Training-Free Guided Diffusion by ADMM
* DeDe: Detecting Backdoor Samples for SSL Encoders via Decoders
* Deep Change Monitoring: A Hyperbolic Representative Learning Framework and a Dataset for Long-term Fine-grained Tree Change Detection
* Deep Fair Multi-View Clustering with Attention KAN
* Deep Learning and Transformer Models for Groundwater Level Prediction in the Marvdasht Plain: Protecting UNESCO Heritage Sites: Persepolis and Naqsh-e Rustam
* Deep Learning Approach to Identify Rock Bolts in Complex 3D Point Clouds of Underground Mines Captured Using Mobile Laser Scanners, A
* Deep learning for wildfire risk prediction: Integrating remote sensing and environmental data
* Deep Learning Small Water Body Mapping by Transfer Learning from Sentinel-2 to PlanetScope
* Deep Learning-Based Echo Extrapolation Method by Fusing Radar Mosaic and RMAPS-NOW Data, A
* Deep Learning-Based Method for Detection of Multiple Maneuvering Targets and Parameter Estimation, A
* Deep Learning-Driven Multi-Temporal Detection: Leveraging DeeplabV3+/Efficientnet-B08 Semantic Segmentation for Deforestation and Forest Fire Detection
* Deep Reinforcement Learning Method for Autonomous Driving Integrating Multi-Modal Fusion, A
* Deep Reinforcement Learning Method with a Low Intercept Probability in a Netted Synthetic Aperture Radar, A
* Deep Rib Fracture Instance Segmentation and Classification From CT on the RibFrac Challenge
* DeepCompress-ViT: Rethinking Model Compression to Enhance Efficiency of Vision Transformers at the Edge
* DeepLA-Net: Very Deep Local Aggregation Networks for Point Cloud Analysis
* DefectFill: Realistic Defect Generation with Inpainting Diffusion Model for Visual Inspection
* DefMamba: Deformable Visual State Space Model
* DEFOM-Stereo: Depth Foundation Model Based Stereo Matching
* Deformable Radial Kernel Splatting
* DeformCL: Learning Deformable Centerline Representation for Vessel Extraction in 3D Medical Image
* Degradation-Aware Feature Perturbation for All-in-One Image Restoration
* DEIM: DETR with Improved Matching for Fast Convergence
* DejaVid: Encoder-Agnostic Learned Temporal Matching for Video Classification
* DELBO: Efficient Score Algorithm for Feature Selection on Latent Variables of VAE
* Delineation of Dynamic Coastal Boundaries in South Africa from Hyper-Temporal Sentinel-2 Imagery
* DELT: A Simple Diversity-driven EarlyLate Training for Dataset Distillation
* Delving Into Instance Modeling for Weakly Supervised Video Anomaly Detection
* Denoising Functional Maps: Diffusion Models for Shape Correspondence
* Dense Dispersed Structured Light for Hyperspectral 3D Imaging of Dynamic Scenes
* Dense Match Summarization for Faster Two-view Estimation
* Dense Matching with Low Computational Complexity for Disparity Estimation in the Radargrammetric Approach of SAR Intensity Images
* Dense-SfM: Structure from Motion with Dense Consistent Matching
* DeNVeR: Deformable Neural Vessel Representations for Unsupervised Video Vessel Segmentation
* Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera
* Depth-Guided Bundle Sampling for Efficient Generalizable Neural Radiance Field Reconstruction*
* DepthCD: Depth prompting in 2D remote sensing imagery change detection
* DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos
* DepthCues: Evaluating Monocular Depth Perception in Large Vision Models
* DepthSplat: Connecting Gaussian Splatting and Depth
* Derivative-Free Diffusion Manifold-Constrained Gradient for Unified XAI
* DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models
* Descriptor-In-Pixel: Point-Feature Tracking for Pixel Processor Arrays
* Design2GarmentCode: Turning Design Concepts to Tangible Garments Through Program Synthesis
* DesignDiffusion: High-Quality Text-to-Design Image Generation with Diffusion Models
* DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes
* DeSplat: Decomposed Gaussian Splatting for Distractor-Free Rendering
* Detail-Preserving Latent Diffusion for Stable Shadow Removal
* Detailed Performance Evaluation of the GK2A Fog Detection Algorithm Using Ground-Based Visibility Meter Data (2021-2023, Part I), A
* Detect Any Mirrors: Boosting Learning Reliability on Large-Scale Unlabeled Data with an Iterative Data Engine
* Detect-and-Guide: Self-regulation of Diffusion Models for Safe Text-to-Image Generation via Guideline Token Optimization
* Detecting Adversarial Data Using Perturbation Forgery
* Detecting Backdoor Attacks in Federated Learning via Direction Alignment Inspection
* Detecting Open World Objects via Partial Attribute Assignment
* Detecting Out-of-distribution through the Lens of Neural Collapse
* Detecting Planting Holes Using Improved YOLO-PH Algorithm with UAV Images
* Detection-Friendly Nonuniformity Correction: A Union Framework for Infrared UAV Target Detection
* Deterministic Certification of Graph Neural Networks against Graph Poisoning Attacks with Arbitrary Perturbations
* Deterministic Image-to-Image Translation via Denoising Brownian Bridge Models with Dual Approximators
* Deterministic-to-Stochastic Diverse Latent Feature Mapping for Human Motion Synthesis
* Development of a Spaceborne SAR Based on a Reflector Antenna, The
* Devil is in Low-Level Features for Cross-Domain Few-Shot Segmentation, The
* Devil is in Temporal Token: High Quality Video Reasoning Segmentation, The
* Devil is in the Detail: Towards Injecting Fine Details of Image Prompt in Image Generation via Conflict-free Guidance and Stratified Attention
* Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-To-Video Generation, The
* Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attention Lens
* DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness
* DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation
* DFANet: A Deep Feature Attention Network for Building Change Detection in Remote Sensing Imagery
* DFAST: A Differential-Frequency Attention-Based Band Selection Transformer for Hyperspectral Image Classification
* dFLMoE: Decentralized Federated Learning via Mixture of Experts for Medical Data Analysis
* DFM: Differentiable Feature Matching for Anomaly Detection
* DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation
* DGMNet: Hyperspectral Unmixing Dual-Branch Network Integrating Adaptive Hop-Aware GCN and Neighborhood Offset Mamba
* DH-Set: Improving Vision-Language Alignment with Diverse and Hybrid Set-Embeddings Learning
* DI-PCG: Diffusion-based Efficient Inverse Procedural Content Generation for High-quality 3D Asset Creation
* DiC: Rethinking Conv3x3 Designs in Diffusion Models
* DiET-GS: Diffusion Prior and Event Stream-Assisted Motion Deblurring 3D Gaussian Splatting
* Diff-Palm: Realistic Palmprint Generation with Polynomial Creases and Intra-Class Variation Controllable Diffusion Models
* Diff2Flow: Training Flow Matching Models via Diffusion Model Alignment
* DiffCAM: Data-Driven Saliency Maps by Capturing Feature Differences
* DIFFER: Disentangling Identity Features via Semantic Cues for Clothes-Changing Person Re-ID
* Difference in MODIS Aerosol Retrieval Accuracy over Chinese Forested Regions, The
* Difference Inversion: Interpolate and Isolate the Difference with Token Consistency for Image Analogy Generation
* Differences in Time Comparison and Positioning of BDS-3 PPP-B2b Signal Broadcast Through GEO
* Differentiable Inverse Rendering with Interpretable Basis BRDFs
* DiffFNO: Diffusion Fourier Neural Operator
* DiffLO: Semantic-Aware LiDAR Odometry with Diffusion-Based Refinement
* DiffLocks: Generating 3D Hair from a Single Image using Diffusion Models
* DiffPortrait360: Consistent Portrait Diffusion for 360 View Synthesis
* DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
* Diffusion Bridge: Leveraging Diffusion Model to Reduce the Modality Gap Between Text and Vision for Zero-Shot Image Captioning
* Diffusion Model is Effectively Its Own Teacher
* Diffusion Self-Distillation for Zero-Shot Customized Image Generation
* Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models
* Diffusion-based Event Generation for High-Quality Image Deblurring
* Diffusion-based Realistic Listening Head Generation via Hybrid Motion Modeling
* Diffusion-Enhanced Test-Time Adaptation with Text and Image Augmentation
* DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving
* DiffusionRenderer: Neural Inverse and Forward Rendering with Video Diffusion Models
* DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint Diffusion
* DiffVsgg: Diffusion-Driven Online Video Scene Graph Generation
* DifIISR: A Diffusion Model with Gradient Guidance for Infrared Image Super-Resolution
* Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models
* DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention
* DiGIT: Multi-Dilated Gated Encoder and Central-Adjacent Region Integrated Decoder for Temporal Action Detection Transformer
* Digital super-resolution for TDI remote sensing via micro-misaligned pushbroom imaging and computational reconstruction
* Digital Twin Catalog: A Large-Scale Photorealistic 3D Object Digital Twin Dataset
* Digital Twin Vision for Freight Railroads: A Case Study in North America
* DiN: Diffusion Model for Robust Medical VQA with Semantic Noisy Labels
* Dinomaly: The Less Is More Philosophy in Multi-Class Unsupervised Anomaly Detection
* DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment
* DIO: Decomposable Implicit 4D Occupancy-Flow World Model
* Directional Label Diffusion Model for Learning from Noisy Labels
* Directional Wave Spectrum Inversion Algorithm with HF Surface Wave Radar Network, A
* DirectTriGS: Triplane-based Gaussian Splatting Field Representation for 3D Generation
* DiSciPLE: Learning Interpretable Programs for Scientific Visual Discovery
* Disco4D: Disentangled 4D Human Generation and Animation from a Single Image
* Discovering Fine-Grained Visual-Concept Relations by Disentangled Optimal Transport Concept Bottleneck Models
* Discovering Hidden Visual Concepts Beyond Linguistic Input in Infant Learning
* DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval
* Discrete Index Graph Diffusion Model for 3D Meshes Synthesis, A
* Discrete to Continuous: Generating Smooth Transition Poses from Sign Language Observations
* Disentangled Pose and Appearance Guidance for Multi-Pose Generation
* Disentangling Safe and Unsafe Image Corruptions via Anisotropy and Locality
* DiskVPS: Vanishing Point Detector via Hough Transform in a Disk Region
* Dispatching Method for Demand Responsive Transit With Passengers' Hidden Preference Exploitation Capability, A
* Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
* DiSRT-In-Bed: Diffusion-Based Sim-To-Real Transfer Framework for In-Bed Human Mesh Recovery
* Dissecting and Mitigating Diffusion Bias via Mechanistic Interpretability
* Distilled Prompt Learning for Incomplete Multimodal Survival Prediction
* Distilling Long-tailed Datasets
* Distilling Monocular Foundation Model for Fine-grained Depth Completion
* Distilling Multi-Modal Large Language Models for Autonomous Driving
* Distilling Spatially-Heterogeneous Distortion Perception for Blind Image Quality Assessment
* Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation
* Distinct Regional and Seasonal Patterns of Atmospheric NH3 Observed from Satellite over East Asia
* DistinctAD: Distinctive Audio Description Generation in Contexts
* Distinguish Then Exploit: Source-free Open Set Domain Adaptation via Weight Barcode Estimation and Sparse Label Assignment
* Distraction is All You Need for Multimodal Large Language Model Jailbreaking
* Distractor-Aware Memory for Visual Object Tracking with SAM2, A
* Distribution Learning Based on Evolutionary Algorithm-Assisted Deep Neural Networks for Imbalanced Image Classification
* Distribution Prototype Diffusion Learning for Open-set Supervised Anomaly Detection
* DiTASK: Multi-Task Fine-Tuning with Diffeomorphic Transformations
* DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation
* DIV-FF: Dynamic Image-Video Feature Fields For Environment Understanding in Egocentric Videos
* DiverseFlow: Sample-Efficient Diverse Mode Coverage in Flows
* Divide and Conquer: Heterogeneous Noise Integration for Diffusion-based Adversarial Purification
* Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
* DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
* DKC: Differentiated Knowledge Consolidation for Cloth-Hybrid Lifelong Person Re-identification
* DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture
* DL2G: Degradation-guided Local-to-Global Restoration for Eyeglass Reflection Removal
* DMA-Net: Dynamic Morphology-Aware Segmentation Network for Remote Sensing Images
* DMF-YOLO: Dynamic Multi-Scale Feature Fusion Network-Driven Small Target Detection in UAV Aerial Images
* DMRFlow: 4D Radar Scene Flow Estimation With Decoupled Matching and Refinement
* DNF: Unconditional 4D Generation with Dictionary-based Neural Fields
* DnLUT: Ultra-Efficient Color Image Denoising via Channel-Aware Lookup Tables
* Do computer vision foundation models learn the low-level characteristics of the human visual system?
* Do ImageNet-trained models learn shortcuts? The impact of frequency shortcuts on generalization
* Do Visual Imaginations Improve Vision-and-Language Navigation Agents?
* Do We Always Need the Simplicity Bias? Looking for Optimal Inductive Biases in the Wild
* Do We Really Need Curated Malicious Data for Safety Alignment in Multi-modal Large Language Models?
* Do Your Best and Get Enough Rest for Continual Learning
* DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding
* Docopilot: Improving Multimodal Models for Document-Level Understanding
* DocSAM: Unified Document Image Segmentation via Query Decomposition and Heterogeneous Mixed Learning
* DocScanner: Robust Document Image Rectification with Progressive Learning
* Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents
* DocVLM: Make Your VLM an Efficient Reader
* Does Matter: Visual Navigation via Denoising Diffusion Bridge Models
* DoF-Gaussian: Controllable Depth-of-Field for 3D Gaussian Splatting
* DOF-GS: Adjustable Depth-of-Field 3D Gaussian Splatting for Post-Capture Refocusing, Defocus Rendering and Blur Removal
* Domain Adaptive Diabetic Retinopathy Grading with Model Absence and Flowing Data
* Domain Generalization in CLIP via Learning with Diverse Text Prompts
* Don't Shake the Wheel: Momentum-Aware Planning in End-to-End Autonomous Driving
* Doppelgangers++: Improved Visual Disambiguation with Geometric 3D Features
* Doppelgängers and Adversarial Vulnerability
* Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders
* DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles
* DORNet: A Degradation Oriented and Regularized Network for Blind Depth Super-Resolution
* Downscaled GOME-2 SIF Based on Machine Learning Enhances the Correlation with Ecosystem Productivity, The
* Downscaling of Urban Land Surface Temperatures Using Geospatial Machine Learning with Landsat 8/9 and Sentinel-2 Imagery
* DPC: Dual-Prompt Collaboration for Tuning Vision-Language Models
* DPFlow: Adaptive Optical Flow Estimation with a Dual-Pyramid Framework
* DPSeg: Dual-Prompt Cost Volume Learning for Open-Vocabulary Semantic Segmentation
* DPU: Dynamic Prototype Updating for Multimodal Out-of-Distribution Detection
* Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration
* Dragin3D: Image Editing by Dragging in 3D Space
* DRAWER: Digital Reconstruction and Articulation With Environment Realism
* DreamCache: Finetuning-Free Lightweight Personalized Image Generation via Feature Caching
* DreamOmni: Unified Image Generation and Editing
* DreamRelation: Bridging Customization and Relation Generation
* DreamText: High Fidelity Scene Text Synthesis
* DreamTrack: Dreaming the Future for Multimodal Visual Object Tracking
* DRiVE: Diffusion-based Rigging Empowers Generation of Versatile and Expressive Characters
* DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation
* DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable Text-to-Image Diffusion Generation
* DriveGPT4-V2: Harnessing Large Language Model Capabilities for Enhanced Closed-Loop Autonomous Driving
* DriveScape: High-Resolution Driving Video Generation by Multi-View Feature Fusion
* Driving by the Rules: A Benchmark for Integrating Traffic Sign Regulations into Vectorized HD Map
* Driving Forces and Future Scenario Simulation of Urban Agglomeration Expansion in China: A Case Study of the Pearl River Delta Urban Agglomeration
* DrivingSphere: Building a High-fidelity 4D World for Closed-loop Simulation
* DroneSplat: 3D Gaussian Splatting for Robust 3D Reconstruction from In-the-Wild Drone Imagery
* Drop2Sparse: Improving Dataset Distillation via Sparse Model
* DropGaussian: Structural Regularization for Sparse-view Gaussian Splatting
* DropoutGS: Dropping Out Gaussians for Better Sparse-view Rendering
* DrVideo: Document Retrieval Based Long Video Understanding
* DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering
* DSV-LFS: Unifying LLM-Driven Semantic Cues with Visual Features for Robust Few-Shot Segmentation
* DTGBrepGen: A Novel B-rep Generative Model through Decoupling Topology and Geometry
* DTOS: Dynamic Time Object Sensing with Large Multimodal Model
* Dual Consolidation for Pre-Trained Model-Based Domain-Incremental Learning
* Dual Diffusion for Unified Image Generation and Understanding
* Dual Distillation Fusion for Weakly Supervised Anomaly Detection in Surveillance Videos
* Dual Energy-Based Model with Open-World Uncertainty Estimation for Out-of-distribution Detection
* Dual Exposure Stereo for Extended Dynamic Range 3D Imaging
* Dual Focus-Attention Transformer for Robust Point Cloud Registration
* Dual Graph Inference Network for Weakly Supervised Semantic Segmentation
* Dual Prompting Image Restoration with Diffusion Transformers
* Dual Semantic Guidance for Open Vocabulary Semantic Segmentation
* Dual-Agent Optimization Framework for Cross-Domain Few-Shot Segmentation
* Dual-Branch Spatial-Frequency Domain Fusion Method with Cross Attention for SAR Image Target Recognition, A
* Dual-Branch Spatial-Spectral Transformer with Similarity Propagation for Hyperspectral Image Classification
* Dual-Granularity Semantic Guided Sparse Routing Diffusion Model for General Pansharpening
* Dual-Interrelated Diffusion Model for Few-Shot Anomaly Image Generation
* Dual-Level Cross-Modality Neural Architecture Search for Guided Image Super-Resolution
* Dual-Variable Selection Framework for Enhancing Forest Aboveground Biomass Estimation via Multi-Source Remote Sensing, A
* Dual-view X-ray Detection: Can AI Detect Prohibited Items from Dual-view X-ray Images like Humans?
* DualPM: Dual Posed-Canonical Point Maps for 3D Shape and Pose Reconstruction
* DualTalk: Dual-Speaker Interaction for 3D Talking Head Conversations
* DUNE: Distilling a Universal Encoder from Heterogeneous 2D and 3D Teachers
* DV-Matcher: Deformation-based Non-Rigid Point Cloud Matching Guided by Pre-trained Visual Features
* DVHGNN: Multi-Scale Dilated Vision HGNN for Efficient Vision Recognition
* DViN: Dynamic Visual Routing Network for Weakly Supervised Referring Expression Comprehension
* DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models
* DyCON: Dynamic Uncertainty-aware Consistency and Contrastive Learning for Semi-supervised Medical Image Segmentation
* DyFo: A Training-Free Dynamic Focus Visual Search for Enhancing LMMs in Fine-Grained Visual Understanding
* DyMO: Training-Free Diffusion Model Alignment with Dynamic Multi-Objective Scheduling
* Dyn-HaMR: Recovering 4D Interacting Hand Motion from a Dynamic Camera
* Dynamic Camera Poses and Where to Find Them
* Dynamic Clustering and Anomaly Detection of Train Delays in Stream Data: An Incremental Dirichlet Process Approach
* Dynamic Content Prediction with Motion-aware Priors for Blind Face Video Restoration
* Dynamic Derivation and Elimination: Audio Visual Segmentation with Enhanced Audio Semantics
* Dynamic Event-Triggered Nonsingular Predefined-Time Tracking Control for Fully Heterogeneous Vehicle Platoon With Spacing Constraints
* Dynamic Group Normalization: Spatio-Temporal Adaptation to Evolving Data Statistics
* Dynamic Integration of Task-Specific Adapters for Class Incremental Learning
* Dynamic Model Merging With Mixture of Weights
* Dynamic Motion Blending for Versatile Motion Editing
* Dynamic Neural Surfaces for Elastic 4D Shape Representation and Analysis
* Dynamic Personalized Federated Learning for Cross-Spectral Palmprint Recognition
* Dynamic Pseudo Labeling via Gradient Cutting for High-Low Entropy Exploration
* Dynamic Stereotype Theory Induced Micro-expression Recognition with Oriented Deformation
* Dynamic Updates for Language Adaptation in Visual-Language Tracking
* DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes
* DynaMoDe-NeRF: Motion-aware Deblurring Neural Radiance Field for Dynamic Scenes
* DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding
* DynPose: Largely Improving the Efficiency of Human Pose Estimation by a Simple Dynamic Framework
* DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution
* DynScene: Scalable Generation of Dynamic Robotic Manipulation Scenes for Embodied AI
* EAP-GS: Efficient Augmentation of Pointcloud for 3D Gaussian Splatting in Few-shot Scene Reconstruction
* Early Detection of Soil Salinization by Means of Spaceborne Hyperspectral Imagery
* Early Warnings and Envelope Adjustment-Based Safety Flight Control With Application to Hypersonic Vehicles
* Early-Bird Diffusion: Investigating and Leveraging Timestep-Aware Early-Bird Tickets in Diffusion Models for Efficient Training
* EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues
* EASEMVC: Efficient Dual Selection Mechanism for Deep Multi-View Clustering
* Easy-editable Image Vectorization with Multi-layer Multi-scale Distributed Visual Feature Embedding
* EasyCraft: A Robust and Efficient Framework for Automatic Avatar Crafting
* EasyHOI: Unleashing the Power of Large Models for Reconstructing Hand-Object Interactions in the Wild
* EBS-EKF: Accurate and High Frequency Event-Based Star Tracking
* ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark
* EchoMatch: Partial-to-Partial Shape Matching via Correspondence Reflection
* EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
* EchoONE: Segmenting Multiple echocardiography Planes in One Model
* EchoTraffic: Enhancing Traffic Anomaly Understanding with Audio-Visual Insights
* EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance
* Eco-Driving Decision Making Based on V2X Communication and Spatio-Temporal Prediction of Pedestrians
* ECVC: Exploiting Non-Local Correlations in Multiple Frames for Contextual Video Compression
* ED4: Explicit Data-Level Debiasing for Deepfake Detection
* EDCFlow: Exploring Temporally Dense Difference Maps for Event-based Optical Flow Estimation
* EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation
* Edge-CVT: Edge-informed CNN and vision transformer for building change detection in satellite imagery
* Edge-SD-SR: Low Latency and Parameter Efficient On-device Super-Resolution with Stable Diffusion via Bidirectional Conditioning
* EdgeDiff: Edge-aware Diffusion Network for Building Reconstruction from Point Clouds
* EdgeMovingNet: Edge-preserving Point Cloud Reconstruction via Joint Geometry Features
* EdgeTAM: On-Device Track Anything Model
* Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing
* EditAR: Unified Conditional Generation with Autoregressive Models
* EditSplat: Multi-View Fusion and Attention-Guided Optimization for View-Consistent 3D Scene Editing with 3D Gaussian Splatting
* EDM: Equirectangular Projection-Oriented Dense Kernelized Feature Matching
* EEE-Bench: A Comprehensive Multimodal Electrical And Electronics Engineering Benchmark
* Effective Cloud Removal for Remote Sensing Images by an Improved Mean-Reverting Denoising Model with Elucidated Design Space
* Effective SAM Combination for Open-Vocabulary Semantic Segmentation
* Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos
* Effects of Clouds and Shadows on the Use of Independent Component Analysis for Feature Extraction
* Effects of Large- and Meso-Scale Circulation on Uprising Dust over Bodélé in June 2006 and June 2011
* Efficient Aerial Image Detection with Variable Receptive Fields, An
* Efficient ANN-Guided Distillation: Aligning Rate-based Features of Spiking Neural Networks through Hybrid Block-wise Replacement
* Efficient Data Driven Mixture-of-Expert Extraction from Trained Networks
* Efficient Decoupled Feature 3D Gaussian Splatting via Hierarchical Compression
* Efficient Decoupled Optimization Algorithm for a Class of Regression Models, An
* Efficient Depth Estimation for Unstable Stereo Camera Systems on AR Glasses
* Efficient Diffusion as Low Light Enhancer
* Efficient Diffusion Models: A Comprehensive Survey From Principles to Practices
* Efficient Dynamic Scene Editing via 4D Gaussian-based Static-Dynamic Separation
* Efficient Event-Based Object Detection: A Hybrid Neural Network with Spatial and Temporal Attention
* Efficient Fine-Tuning and Concept Suppression for Pruned Diffusion Models
* Efficient Long Video Tokenization via Coordinate-based Patch Reconstruction
* Efficient Motion-Aware Video MLLM
* Efficient Multidimensional Parameter Estimation Using Machine Learning-Assisted SAGE Algorithm
* Efficient Personalization of Quantized Diffusion Model without Backpropagation
* Efficient Sampling Schemes for 3D Imaging of Radar Target Scattering Based on Synchronized Linear Scanning and Rotational Motion
* Efficient Ship Target Integrated Imaging and Detection Framework (ST-IIDF) for Space-Borne SAR Echo Data, An
* Efficient Test-Time Adaptive Object Detection via Sensitivity-Guided Pruning
* Efficient Transfer Learning for Video-language Foundation Models
* Efficient Unsupervised Clustering of Hyperspectral Images via Flexible Multi-Anchor Graphs
* Efficient Video Face Enhancement with Enhanced Spatial-Temporal Consistency
* Efficient Video Super-Resolution for Real-time Rendering with Decoupled G-buffer Guidance
* Efficient Visual State Space Model for Image Deblurring
* EfficientLLaVA: Generalizable Auto-Pruning for Large Vision-language Models
* EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality
* EffiDec3D: An Optimized Decoder for High-Performance and Efficient 3D Medical Image Segmentation
* Effortless Active Labeling for Long-Term Test-Time Adaptation
* Ego4o: Egocentric Human Motion Capture and Understanding from Multi-Modal Input
* EgoLife: Towards Egocentric Life Assistant
* EgoLM: Multi-Modal Language Model of Egocentric Motions
* EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision
* EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering
* EIDT-V: Exploiting Intersections in Diffusion Trajectories for Model-Agnostic, Zero-Shot, Training-Free Text-to-Video Generation
* EigenGS Representation: From Eigenspace to Gaussian Image Space
* Electromyography-Informed Facial Expression Reconstruction for Physiological-Based Synthesis and Analysis
* Elevation Models, Shadows, and Infrared: Integrating Datasets for Thermographic Leak Detection
* Elevation-Aware Domain Adaptation for Sematic Segmentation of Aerial Images
* Elevation-Coupled Multivariate Regression Model for GNSS-Based FY-4A Precipitable Water Vapor, An
* Embodied Scene Understanding for Vision Language Models via MetaVQA
* Embracing Collaboration Over Competition: Condensing Multiple Prompts for Visual In-Context Learning
* EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing
* EMOE: Modality-Specific Enhanced Dynamic Emotion Experts
* EmoEdit: Evoking Emotions through Image Manipulation
* EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion
* EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
* Emphasizing Discriminative Features for Dataset Distillation in Complex Scenarios
* Empowering Large Language Models with 3D Situation Awareness
* Empowering LLMs to Understand and Generate Complex Vector Graphics
* Empowering Vector Graphics with Consistently Arbitrary Viewing and View-dependent Visibility
* Encapsulated Composition of Text-to-Image and Text-to-Video Models for High-Quality Video Synthesis
* End-to-End Deep Learning Generative Framework for Refinable Shape Matching and Generation, An
* End-to-End HOI Reconstruction Transformer with Graph-based Encoding
* End-to-End Implicit Neural Representations for Classification
* End-to-End Robust Point Cloud Semantic Segmentation Network with Single-Step Conditional Diffusion Models, An
* Enduring, Efficient and Robust Trajectory Prediction Attack in Autonomous Driving via Optimization-Driven Multi-Frame Perturbation Framework
* EnergyMogen: Compositional Human Motion Generation with Energy-Based Diffusion Model in Latent Space
* Enhanced Contrastive Learning with Multi-view Longitudinal Data for Chest X-ray Report Generation
* Enhanced Mainlobe Jamming Suppression in Distributed Array Radar via Joint Optimization of Radar Positions and Subpulse Frequencies
* Enhanced Multi-ULA Sparse Array With Improved DOA Estimation Performance, An
* Enhanced Offline/Online Heterogeneous Signcryption Protocol With Batch Verification for Edge Computing-Based VANETs, An
* Enhanced OoD Detection through Cross-Modal Alignment of Multi-Modal Representations
* Enhanced Rapid Mangrove Habitat Mapping Approach to Setting Protected Areas Using Satellite Indices and Deep Learning: A Case Study of the Solomon Islands
* Enhanced then Progressive Fusion with View Graph for Multi-View Clustering
* Enhanced Visual-Semantic Interaction with Tailored Prompts for Pedestrian Attribute Recognition
* Enhancing 3D Gaze Estimation in the Wild using Weak Supervision with Gaze Following Labels
* Enhancing Adversarial Transferability with Checkpoints of a Single Model's Training
* Enhancing Creative Generation on Stable Diffusion-based Models
* Enhancing Dance-To-Music Generation via Negative Conditioning Latent Diffusion Model
* Enhancing Dataset Distillation via Non-Critical Region Refinement
* Enhancing Direction-of-Arrival Estimation for Single-Channel Reconfigurable Intelligent Surface via Phase Coding Design
* Enhancing Diversity for Data-free Quantization
* Enhancing Facial Privacy Protection via Weakening Diffusion Purification
* Enhancing Few-Shot Class-Incremental Learning via Training-Free Bi-Level Modality Calibration
* Enhancing Online Continual Learning with Plug-and-Play State Space Model and Class-Conditional Mixture of Discretization
* Enhancing Privacy-Utility Trade-offs to Mitigate Memorization in Diffusion Models
* Enhancing SAM with Efficient Prompting and Preference Optimization for Semi-supervised Medical Image Segmentation
* Enhancing Testing-Time Robustness for Trusted Multi-View Classification in the Wild
* Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation
* Enhancing Virtual Try-On with Synthetic Pairs and Error-Aware Noise Scheduling
* Enhancing Visible-Infrared Person Re-Identification With Modality- and Instance-Aware Adaptation Learning
* Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data
* EnliveningGS: Active Locomotion of 3DGS
* EntityErasure: Erasing Entity Cleanly via Amodal Entity Segmentation and Completion
* EntitySAM: Segment Everything in Video
* EntropyMark: Towards More Harmless Backdoor Watermark via Entropy-based Constraint for Open-source Dataset Copyright Protection
* EnvGS: Modeling View-Dependent Appearance with Environment Gaussian
* EnvPoser: Environment-aware Realistic Human Motion Estimation from Sparse Observations with Uncertainty Modeling
* EquiPose: Exploiting Permutation Equivariance for Relative Camera Pose Estimation
* Erase Diffusion: Empowering Object Removal Through Calibrating Diffusion Pathways
* Erasing Undesirable Influence in Diffusion Models
* Error Mitigation Teacher for Semi-Supervised Remote Sensing Object Detection
* ERUPT: Efficient Rendering with Unposed Patch Transformer
* ES-Net Empowers Forest Disturbance Monitoring: Edge-Semantic Collaborative Network for Canopy Gap Mapping
* ESC: Erasing Space Concept for Knowledge Deletion
* ESCAPE: Equivariant Shape Completion via Anchor Point Encoding
* Escaping Plato's Cave: Towards the Alignment of 3D and Text Latent Spaces
* Establishment and Verification of a Velocity Doppler Transfer Model for Dual-Beam Squint Airborne SAR, The
* Estimating Body and Hand Motion in an Ego-sensed World
* Estimating carbon fluxes over North America using a physics-constrained deep learning model
* Estimating Household Green Space in Composite Residential Community Solely Using Drone Oblique Photography
* Estimation and Change Analysis of Grassland AGB in the China-Mongolia-Russia Border Area Based on Multi-Source Geospatial Data
* Estimation of Footprint-Scale Across-Track Slopes Based on Elevation Frequency Histogram from Single-Track ICESat-2 Photon Data of Strong Beam
* Estimation of Forest Aboveground Biomass Using Sentinel-1/2 Synergized with Extrapolated Parameters from LiDAR Data and Analysis of Its Ecological Driving Factors
* Estimation of Leaf, Spike, Stem and Total Biomass of Winter Wheat Under Water-Deficit Conditions Using UAV Multimodal Data and Machine Learning
* Estimation of Rice Leaf Nitrogen Content Using UAV-Based Spectral-Texture Fusion Indices (STFIs) and Two-Stage Feature Selection
* Estimation of Subtropical Forest Aboveground Biomass Using Active and Passive Sentinel Data with Canopy Height
* Estimation of Tree Diameter at Breast Height (DBH) and Biomass from Allometric Models Using LiDAR Data: A Case of the Lake Broadwater Forest in Southeast Queensland, Australia
* Estimation of Ultrahigh Resolution PM2.5 in Urban Areas by Using 30 m Landsat-8 and Sentinel-2 AOD Retrievals
* ETAP: Event-based Tracking of Any Point
* Ev-3DOD: Pushing the Temporal Boundaries of 3D Object Detection with Event Cameras
* Eval3D: Interpretable and Fine-Grained Evaluation for 3D Generation
* Evaluating Effectiveness and Identifying Appropriate Methods for Anomaly Detection in Intelligent Transportation Systems
* Evaluating Geostatistical and Statistical Merging Methods for Radar-Gauge Rainfall Integration: A Multi-Method Comparative Study
* Evaluating Model Perception of Color Illusions in Photorealistic Scenes
* Evaluating Remote Sensing Products for Pasture Composition and Yield Prediction
* Evaluating the Interferometric Performance of China's Dual-Star SAR Satellite Constellation in Large Deformation Scenarios: A Case Study in the Jinchuan Mining Area, Gansu
* Evaluating UAV LiDAR and Field Spectroscopy for Estimating Residual Dry Matter Across Conservation Grazing Lands
* Evaluating Vision-Language Models as Evaluators in Path Planning
* Evaluation and Improvement of Ocean Color Algorithms for Chlorophyll-a and Diffuse Attenuation Coefficients in the Arctic Shelf
* Evaluation of Photogrammetric Methods for Displacement Measurement During Structural Load Testing
* Evaluation of the Accuracy and Applicability of Reanalysis Precipitation Products in the Lower Yarlung Zangbo Basin
* EvEnhancer: Empowering Effectiveness, Efficiency and Generalizability for Continuous Space-Time Video Super-Resolution with Events
* Event Cameras Meet SPADs for High-Speed, Low-Bandwidth Imaging
* Event Ellipsometer: Event-based Mueller-Matrix Video Imaging
* Event fields: Capturing light fields at high speed, resolution, and dynamic range
* Event-based Video Super-Resolution via State Space Models
* Event-Equalized Dense Video Captioning
* EventFly: Event Camera Perception from Ground to the Sky
* EventGPT: Event Stream Understanding with Multimodal Large Language Models
* EventPSR: Surface Normal and Reflectance Estimation from Photometric Stereo Using an Event Camera
* EventSplat: 3D Gaussian Splatting from Moving Event Cameras for Real-time Rendering
* Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond
* Everything to the Synthetic: Diffusion-Driven Test-Time Adaptation via Synthetic-Domain Alignment
* EvOcc: Accurate Semantic Occupancy for Automated Driving Using Evidence Theory
* EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis
* Evolution and Driving Forces of Ecological Service Value in Response to Land Use Change in Tarim Basin, Northwest China
* Evolving High-Quality Rendering and Reconstruction in a Unified Framework with Contribution-Adaptive Regularization
* EVOS: Efficient Implicit Neural Training via EVOlutionary Selector
* EVPGS: Enhanced View Prior Guidance for Splatting-based Extrapolated View Synthesis
* Exact: Exploring Space-Time Perceptive Clues for Weakly Supervised Satellite Image Time Series Semantic Segmentation
* ExpertAF: Expert Actionable Feedback from Video
* Explainable Saliency: Articulating Reasoning with Contextual Prioritization
* Explaining Domain Shifts in Language: Concept erasing for Interpretable Image Classification
* Explaining in Diffusion: Explaining a Classifier with Diffusion Semantics
* Explicit Depth-Aware Blurry Video Frame Interpolation Guided by Differential Curves
* Exploiting Deblurring Networks for Radiance Fields
* Exploiting Temporal State Space Sharing for Video Semantic Segmentation
* Exploration-Driven Generative Interactive Environments
* Exploring CLIP's Dense Knowledge for Weakly Supervised Semantic Segmentation
* Exploring Contextual Attribute Density in Referring Expression Counting
* Exploring Historical Information for RGBE Visual Tracking with Mamba
* Exploring Intrinsic Normal Prototypes within a Single Image for Universal Anomaly Detection
* Exploring Rich Subjective Quality Information for Image Quality Assessment in the Wild
* Exploring Scene Affinity for Semi-Supervised LiDAR Semantic Segmentation
* Exploring Semantic Feature Discrimination for Perceptual Image Super-Resolution and Opinion-Unaware No-Reference Image Quality Assessment
* Exploring Simple Open-Vocabulary Semantic Segmentation
* Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis
* Exploring Temporally-Aware Features for Point Tracking
* Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis
* Exploring the Effects of Support Restoration on Pictorial Layers Through Multi-Resolution 3D Survey
* Exploring Timeline Control for Facial Motion Generation
* Exploring Visual Vulnerabilities via Multi-Loss Adversarial Search for Jailbreaking Vision-Language Models
* Exposure-slot: Exposure-centric representations learning with Slot-in-Slot Attention for Region-aware Exposure Correction
* Extraction of Agricultural Parcels Using Vector Contour Segmentation Network with Hybrid Backbone and Multiscale Edge Feature Extraction
* Extraction of Sparse Vegetation Cover in Deserts Based on UAV Remote Sensing
* Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You Think
* Extreme Rotation Estimation in the Wild
* Eye in the Sky for Sub-Tidal Seagrass Mapping: Leveraging Unsupervised Domain Adaptation with SegFormer for Multi-Source and Multi-Resolution Aerial Imagery
* EZSR: Event-based Zero-Shot Recognition
* F-LMM: Grounding Frozen Large Multimodal Models
* F3OCUS: Federated Finetuning of Vision-Language Foundation Models with Optimal Client Layer Updating Strategy via Multi-objective Meta-Heuristics
* Face Forgery Video Detection via Temporal Forgery Cue Unraveling
* FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs
* FactCheXcker: Mitigating Measurement Hallucinations in Chest X-ray Report Generation Models
* Factored-NeuS: Reconstructing Surfaces, Illumination, and Materials of Possibly Glossy Objects
* FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation
* FADE: Frequency-Aware Diffusion Model Factorization for Video Editing
* FaithDiff: Unleashing Diffusion Priors for Faithful Image Super-resolution
* FALCON: Fairness Learning via Contrastive Attention Approach to Continual Semantic Scene Understanding
* False-Positive-Centric Framework for Object Detection Disambiguation, A
* FAM Diffusion: Frequency and Attention Modulation for High-Resolution Image Generation with Stable Diffusion
* Fancy123: One Image to High-Quality 3D Mesh Generation via Plug-and-Play Deformation
* Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning
* Fast and Interpretable 2D Homography Decomposition: Similarity-Kernel-Similarity and Affine-Core-Affine Transformations
* Fast and Lightweight 3D Keypoint Detector, A
* Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass
* Faster Parameter-Efficient Tuning with Token Redundancy Reduction
* FASTer: Focal Token Acquiring-and-Scaling Transformer for Long-term 3D Object Detection
* FasterCReW: Performance or Efficiency? A Lightweight Conditional Residual DNN-Based Watermarking Based on FasterNet
* FastVLM: Efficient Vision Encoding for Vision Language Models
* FATE: Full-head Gaussian Avatar with Textural Editing from Monocular Video
* FDS: Frequency-Aware Denoising Score for Text-Guided Latent Diffusion Image Editing
* Feat2GS: Probing Visual Foundation Models with Gaussian Splatting
* Feature Constraints Map Generation Models Integrating Generative Adversarial and Diffusion Denoising
* Feature Information Driven Position Gaussian Distribution Estimation for Tiny Object Detection
* Feature Preserving Shrinkage on Bayesian Neural Networks Via the R2D2 Prior
* Feature Selection for Latent Factor Models
* Feature Spectrum Learning for Remote Sensing Change Detection
* Feature-Preserving Mesh Decimation for Normal Integration
* Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields
* FedAWA: Adaptive Optimization of Aggregation Weights in Federated Learning Using Client Vectors
* FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models
* FedCALM: Conflict-aware Layer-wise Mitigation for Selective Aggregation in Deeper Personalized Federated Learning
* FedCS: Coreset Selection for Federated Learning
* Federated Learning with Domain Shift Eraser
* FedMIA: An Effective Membership Inference Attack Exploiting All for One Principle in Federated Learning
* FedSPA : Generalizable Federated Graph Learning under Homophily Heterogeneity
* FeedEdit: Text-Based Image Editing with Dynamic Feedback Regulation
* FEMNet: A Feature-Enriched Mamba Network for Cloud Detection in Remote Sensing Imagery
* Ferret: An Efficient Online Continual Learning Framework under Varying Memory Constraints
* Few-Shot Implicit Function Generation via Equivariance
* Few-Shot Learning for Annotation-Efficient Nucleus Instance Segmentation
* Few-shot Personalized Scanpath Prediction
* Few-Shot Recognition via Stage-Wise Retrieval-Augmented Finetuning
* Few-Shot Referring Video Single- and Multi-Object Segmentation Via Cross-Modal Affinity with Instance Sequence Matching
* Few-shot Remote Sensing Scene Classification via Parameter-free Attention and Region Matching
* FFaceNeRF: Few-shot Face Editing in Neural Radiance Fields
* FFR: Frequency Feature Rectification for Weakly Supervised Semantic Segmentation
* FG2: Fine-Grained Cross-View Localization by Fine-Grained Feature Matching
* FIction: 4D Future Interaction Prediction from Video
* FIFA: Fine-grained Inter-frame Attention for Driver's Video Gaze Estimation
* FilmComposer: LLM-Driven Music Production for Silent Film Clips
* Filter Images First, Generate Instructions Later: Pre-Instruction Data Selection for Visual Instruction Tuning
* FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation
* Finding Local Diffusion Schrödinger Bridge using Kolmogorov-Arnold Network
* Fine Recognition of MEO SAR Ship Targets Based on a Multi-Level Focusing-Classification Strategy
* Fine-Grained Erasure in Text-To-Image Diffusion-Based Foundation Models
* Fine-Grained Image-Text Correspondence with Cost Aggregation for Open-Vocabulary Part Segmentation
* Fine-Grained Land Use Remote Sensing Mapping in Karst Mountain Areas Using Deep Learning with Geographical Zoning and Stratified Object Extraction
* FineCaption: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity
* FineLIP: Extending CLIP's Reach via Fine-Grained Alignment with Longer Text Inputs
* FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance
* Finer-CAM: Spotting the Difference Reveals Finer Details for Visual Explanation
* FineVQ: Fine-Grained User Generated Content Video Quality Assessment
* Fingerprinting Denoising Diffusion Probabilistic Models
* Finsler Multi-Dimensional Scaling: Manifold Learning for Asymmetric Dimensionality Reduction and Embedding
* FiRe: Fixed-points of Restoration Priors for Solving Inverse Problems
* FIRE: Robust Detection of Diffusion-Generated Images via Frequency-Guided Reconstruction Error
* FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model
* FirePlace: Geometric Refinements of LLM Common Sense Reasoning for 3D Object Placement
* First Step of AI in LEO SOPs: DRL-Driven Epoch Credibility Evaluation to Enhance Opportunistic Positioning Accuracy, The
* Fish-Vista: A Multi-Purpose Dataset for Understanding & Identification of Traits from Images
* FisherTune: Fisher-Guided Robust Tuning of Vision Foundation Models for Domain Generalized Segmentation
* Fitted Neural Lossless Image Compression
* Flag Decomposition for Hierarchical Datasets, A
* FLAIR: VLM with Fine-grained Language-informed Image Representations
* FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training
* FLARE: Feed-Forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views
* Flash-Split: 2D Reflection Removal with Flash Cues and Latent Diffusion Separation
* Flash3D: Super-scaling Point Transformers through Joint Hardware-Geometry Locality
* FlashGS: Efficient 3D Gaussian Splatting for Large-scale and High-resolution Rendering
* FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
* FLAVC: Learned Video Compression with Feature Level Attention
* FlexDrive: Toward Trajectory Flexibility in Driving Scene Gaussian Splatting Reconstruction and Rendering
* FlexGS: Train Once, Deploy Everywhere with Many-in-One Flexible 3D Gaussian Splatting
* Flexible Frame Selection for Efficient Video Reasoning
* Flexible Group Count Enables Hassle-Free Structured Pruning
* FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute
* FlexiSAM: A flexible SAM-based semantic segmentation model for land cover classification using high-resolution multimodal remote sensing imagery
* FlexUOD: The Answer to Real-world Unsupervised Image Outlier Detection
* FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations
* Floating No More: Object-Ground Reconstruction from a Single Image
* Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion
* FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis
* Flow-NeRF: Joint Learning of Geometry, Poses, and Dense Flow within Unified Neural Representations
* Flowing from Words to Pixels: A Noise-Free Framework for Cross-Modality Evolution
* FlowRAM: Grounding Flow Matching Policy with Region-Aware Mamba Framework for Robotic Manipulation
* Floxels: Fast Unsupervised Voxel Based Scene Flow Estimation
* FluidNexus: 3D Fluid Reconstruction and Prediction from a Single Video
* FluxSpace: Disentangled Semantic Editing in Rectified Flow Models
* Focal Split: Untethered Snapshot Depth from Differential Defocus
* Focus-N-Fix: Region-Aware Fine-Tuning for Text-to-Image Generation
* FOCUS: Knowledge-enhanced Adaptive Visual Compression for Few-shot Whole Slide Image Classification
* Focused Human Body Model for Accurate Anthropometric Measurements Extraction, A
* Focusing on Tracks for Online Multi-Object Tracking
* Foley-Flow: Coordinated Video-to-Audio Generation with Masked Audio-Visual Alignment and Dynamic Conditional Flows
* Font-Agent: Enhancing Font Understanding with Large Language Models
* Forcing the SAFY Dynamic Crop Growth Model with Sentinel-2 LAI Estimates and Weather Inputs from AgERA5 Reanalysis and CM SAF SARAH-3 Radiation Data for Estimating Crop Water Requirements and Yield
* Forensic Self-Descriptions Are All You Need for Zero-Shot Detection, Open-Set Source Attribution, and Clustering of AI-generated Images
* Forensics Adapter: Adapting CLIP for Generalizable Face Forgery Detection
* Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models
* Forest Fragmentation in Bavaria: A First-Time Quantitative Analysis Based on Earth Observation Data
* ForestLPR: LiDAR Place Recognition in Forests Attentioning Multiple BEV Density Images
* Formal Modeling and Synthesis of Longitudinal Dynamics Controller for Train Platoons
* Forming Auxiliary High-confident Instance-level Loss to Promote Learning from Label Proportions
* Fortifying Federated Learning Towards Trustworthiness via Auditable Data Valuation and Verifiable Client Contribution
* Foundations of the Theory of Performance-Based Ranking
* FoundationStereo: Zero-Shot Stereo Matching
* FoundHand: Large-Scale Domain-Specific Learning for Controllable Hand Image Generation
* Foveated Instance Segmentation
* FP-Age: Leveraging Face Parsing Attention for Facial Age Estimation in the Wild
* FracFormer: Fracture Reduction Planning With Transformer-Based Shape Restoration and Fracture Data Simulation
* Fractal Calibration for long-tailed object detection
* FRAME: Floor-aligned Representation for Avatar Motion from Egocentric Video
* FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question Answering
* Framework for Mapping Sublimation Features on Mars' South Polar Cap Using Object-Based Image Analysis
* Framework to Retrieve Water Quality Parameters in Small, Optically Diverse Freshwater Ecosystems Using Sentinel-2 MSI Imagery, A
* Free Lunch Enhancements for Multi-modal Crowd Counting
* Free Lunch to Meet the Gap: Intermediate Domain Reconstruction for Cross-Domain Few-Shot Learning
* Free on the Fly: Enhancing Flexibility in Test-Time Adaptation with Online EM
* Free-viewpoint Human Animation with Pose-correlated Reference Selection
* Free360: Layered Gaussian Splatting for Unbounded 360-Degree View Synthesis from Extremely Sparse and Unposed Views
* FreeCloth: Free-form Generation Enhances Challenging Clothed Human Modeling
* FreeFusion: Infrared and Visible Image Fusion via Cross Reconstruction Learning
* FreeGave: 3D Physics Learning from Dynamic Videos by Gaussian Velocity
* Freehand Ultrafast Doppler Ultrasound Imaging With Optical Tracking Allows for Detailed 3D Reconstruction of Blood Flow in the Human Brain
* FreePCA: Integrating Consistency Information across Long-short Frames in Training-free Long Video Generation via Principal Component Analysis
* FreeScene: Mixed Graph Diffusion for 3D Scene Synthesis from Free Prompts
* FreeSim: Toward Free-viewpoint Camera Simulation in Driving Scenes
* FreeTimeGS: Free Gaussian Primitives at Anytime Anywhere for Dynamic Scene Reconstruction
* FreeUV: Ground-Truth-Free Realistic Facial UV Texture Recovery via Cross-Assembly Inference Strategy
* Freezing Fog Microphysics and Visibility Based on CFACT Feb 19 Case
* FreqDebias: Towards Generalizable Deepfake Detection via Consistency-Driven Frequency Debiasing
* Frequency Dynamic Convolution for Dense Image Prediction
* Frequency Syntonization Based on PDOA Protocol in Multi-Band Systems
* Frequency-Biased Synergistic Design for Image Compression and Compensation
* FRESA: Feedforward Reconstruction of Personalized Skinned Avatars from Few Images
* From Alexnet to Transformers: Measuring the Non-linearity of Deep Neural Networks with Affine Optimal Transport
* From Elements to Design: A Layered Approach for Automatic Graphic Design Composition
* From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech
* From FastPoseGait to GPGait++: Bridging the Past and Future for Pose-Based Gait Recognition
* From Head to Tail: Efficient Black-box Model Inversion Attack via Long-tailed Learning
* From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration
* From Laboratory to Real World: A New Benchmark Towards Privacy-Preserved Visible-Infrared Person Re-Identification
* From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons
* From Poses to Identity: Training-Free Person Re-Identification via Feature Centralization
* From Prototypes to General Distributions: An Efficient Curriculum for Masked Image Modeling
* From Slow Bidirectional to Fast Autoregressive Video Diffusion Models
* From Sparse Signal to Smooth Motion: Real-Time Motion Generation with Rolling Prediction Models
* From Sparse to Dense: Camera Relocalization with Scene-Specific Detector from Feature Gaussian Splatting
* From Words to Structured Visuals: A Benchmark and Framework for Text-to-Diagram Generation and Editing
* From Zero to Detail: Deconstructing Ultra-High-Definition Image Restoration from Progressive Spectral Perspective
* FrugalNeRF: Fast Convergence for Extreme Few-shot Novel View Synthesis without Learned Priors
* FruitNinja: 3D Object Interior Texture Generation with Gaussian Splatting
* FSBench: A Figure Skating Benchmark for Advancing Artistic Sports Understanding
* FSboard: Over 3 million characters of ASL fingerspelling collected via smartphones
* FSFM: A Generalizable Face Security Foundation Model via Self-Supervised Facial Representation Learning
* FSHNet: Fully Sparse Hybrid Network for 3D Object Detection
* Full-DoF Egomotion Estimation for Event Cameras Using Geometric Solvers
* FullLoRA: Efficiently Boosting the Robustness of Pretrained Vision Transformers
* Functionality understanding and segmentation in 3D scenes
* FuseFormer: A Manifold Metric Fusing Attention for Pedestrian Trajectory Prediction
* Fuzzy Multimodal Learning for Trusted Cross-modal Retrieval
* g3D-LF: Generalizable 3D-Language Feature Fields for Embodied Tasks
* G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation
* GA3CE: Unconstrained 3D Gaze Estimation with Gaze-Aware 3D Context Encoding
* GAF: Gaussian Avatar Reconstruction from Monocular Videos via Multi-view Diffusion
* Gain from Neighbors: Boosting Model Robustness in the Wild via Adversarial Perturbations Toward Neighboring Classes
* GaitC3I: Robust Cross-Covariate Gait Recognition via Causal Intervention
* Galaxy Walker: Geometry-aware VLMs For Galaxy-scale Understanding
* GaPT-DAR: Category-level Garments Pose Tracking via Integrated 2D Deformation and 3D Reconstruction
* GarmentPile: Point-Level Visual Affordance Guided Retrieval and Adaptation for Cluttered Garments Manipulation
* GASP: Gaussian Avatars with Synthetic Priors
* GauCho: Gaussian Distributions with Cholesky Decomposition for Oriented Object Detection
* GaussHDR: High Dynamic Range Gaussian Splatting via Learning Unified 3D and 2D Local Tone Mapping
* Gaussian Eigen Models for Human Heads
* Gaussian Mixture Segmentation for Managing Deterioration of Large-Scale Road Networks
* Gaussian Prompter: Linking 2D Prompts for 3D Gaussian Segmentation
* Gaussian Splashing: Unified Particles for Versatile Motion Synthesis and Rendering
* Gaussian Splatting Feature Fields for (Privacy-Preserving) Visual Localization
* Gaussian Splatting for Efficient Satellite Image Photogrammetry
* GaussianFormer-2: Probabilistic Gaussian Superposition for Efficient 3D Occupancy Prediction
* GaussianIP: Identity-Preserving Realistic 3D Human Generation via Human-Centric Diffusion Prior
* GaussianSpa: An Optimizing-Sparsifying Simplification Framework for Compact and High-Quality 3D Gaussian Splatting
* GaussianUDF: Inferring Unsigned Distance Functions through 3D Gaussian Splatting
* GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction
* GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding
* GauSTAR: Gaussian Surface Tracking and Reconstruction
* Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders
* GazeGene: Large-scale Synthetic Gaze Dataset with 3D Eyeball Annotations
* Gazing at Rewards: Eye Movements as a Lens into Human and AI Decision-Making in Hybrid Visual Foraging
* Gazing Into Missteps: Leveraging Eye-Gaze for Unsupervised Mistake Detection in Egocentric Videos of Skilled Human Activities
* GBC-Splat: Generalizable Gaussian-Based Clothed Human Digitalization under Sparse RGB Cameras
* GBlobs: Explicit Local Structure via Gaussian Blobs for Improved Cross-Domain LiDAR-based 3D Object Detection
* GCC: Generative Color Constancy via Diffusing a Color Checker
* GCE-Pose: Global Context Enhancement for Category-level Object Pose Estimation
* GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency
* GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control
* Gen3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control
* Gen3DEval: Using vLLMs for Automatic Evaluation of Generated 3D Objects
* GenAssets: Generating in-the-wild 3D Assets in Latent Space
* GenDeg: Diffusion-based Degradation Synthesis for Generalizable All-In-One Image Restoration
* General 3D Vision-Language Model With Fast Rendering and Pre-Training Vision-Language Alignment
* General Adaptive Dual-level Weighting Mechanism for Remote Sensing Pansharpening, A
* General Model for Converting All-Wave Net Radiation at Instantaneous to Daily Scales Under Clear Sky, A
* Generalizable Object Keypoint Localization from Generative Priors
* Generalized Diffusion Detector: Mining Robust Features from Diffusion Models for Domain-Generalized Detection
* Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model
* Generalized Recorrupted-to-Recorrupted: Self-Supervised Learning Beyond Gaussian Noise
* Generalized Relative Pose and Scale from Affine Correspondences
* Generalized Zero-Shot Classification via Semantics-Free Inter-Class Feature Generation
* Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning
* Generating 3D-Consistent Videos from Unposed Internet Photos
* Generating 6DoF Object Manipulation Trajectories from Action Description in Egocentric Vision
* Generating Multimodal Driving Scenes via Next-Scene Prediction
* Generative Densification: Learning to Densify Gaussians for High-Fidelity Generalizable 3D Reconstruction
* Generative Gaussian Splatting for Unbounded 3D City Generation
* Generative Hard Example Augmentation for Semantic Point Cloud Segmentation
* Generative Image Layer Decomposition with Visual Effects
* Generative Inbetweening through Frame-wise Conditions-Driven Video Generation
* Generative Map Priors for Collaborative BEV Semantic Segmentation
* Generative Modeling of Class Probability for Multi-Modal Representation Learning
* Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens
* Generative Multiview Relighting for 3D Reconstruction under Extreme Illumination Variation
* Generative Omnimatte: Learning to Decompose Video into Layers
* Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis
* Generative Photomontage
* Generative Sparse-View Gaussian Splatting
* Generative Video Propagation
* Generative Zero-Shot Composed Image Retrieval
* GenFusion: Closing the Loop between Reconstruction and Generation via Videos
* GENIUS: A Generative Framework for Universal Multimodal Search
* GenManip: LLM-driven Simulation for Generalizable Instruction-Following Manipulation
* GenPC: Zero-shot Point Cloud Completion via 3D Generative Priors
* GenVDM: Generating Vector Displacement Maps From a Single Image
* GeoAvatar: Geometrically-Consistent Multi-Person Avatar Reconstruction from Sparse Multi-View Videos
* GeoDepth: From Point-to-Depth to Plane-to-Depth Modeling for Self-Supervised Monocular Depth Estimation
* Geological Mapping and Rover Mobility Planning Integration: A Case Study for Zhurong Rover's Landing Area
* Geometric Knowledge-Guided Localized Global Distribution Alignment for Federated Learning
* Geometry and Kinematics of the North Karlik Tagh Fault: Implications for the Transpressional Tectonics of Easternmost Tian Shan
* Geometry Field Splatting with Gaussian Surfels
* Geometry in Style: 3D Stylization via Surface Normal Deformation
* Geometry-guided Online 3D Video Synthesis with Multi-View Temporal Consistency
* GeoMM: On Geodesic Perspective for Multi-modal Learning
* Ges3ViG: Incorporating Pointing Gestures into Language-Based 3D Visual Grounding for Embodied Reference Understanding
* GET: Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery
* GFlowVLM: Enhancing Multi-step Reasoning in Vision-Language Models with Generative Flow Networks
* GG-SSMs: Graph-Generating State Space Models
* GIF: Generative Inspiration for Face Recognition at Scale
* GIFStream: 4D Gaussian-Based Immersive Video with Feature Stream
* GigaHands: A Massive Annotated Dataset of Bimanual Hand Activities
* GIVEPose: Gradual Intra-class Variation Elimination for RGB-based Category-Level Object Pose Estimation
* GLane3D: Detecting Lanes with Graph of 3D Keypoints
* GLASS: Guided Latent Slot Diffusion for Object-Centric Learning
* GliaNet: Adaptive Neural Network Structure Learning with Glia-Driven
* Global Validation of the Version F Geophysical Data Records from the TOPEX/POSEIDON Altimetry Satellite Mission
* Global-Local Tree Search in VLMs for 3D Indoor Scene Generation
* Glossy Object Reconstruction with Cost-effective Polarized Acquisition
* GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation
* GlyphMastero: A Glyph Encoder for High-Fidelity Scene Text Editing
* GNSS-Based Models of Displacement, Stress, and Strain in the SHETPENANT Region: Impact of Geodynamic Activity from the ORCA Submarine Volcano
* GNSS-Based Multi-Target RDM Simulation and Detection Performance Analysis
* GO-N3RDet: Geometry Optimized NeRF-enhanced 3D Object Detector
* Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise
* GOAL: Global-local Object Alignment Learning
* GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving
* GOFENet: A Hybrid Transformer-CNN Network Integrating GEOBIA-Based Object Priors for Semantic Segmentation of Remote Sensing Images
* Goku: Flow Based Video Generative Foundation Models
* Golden Cudgel Network for Real-Time Semantic Segmentation
* GoLF-NRT: Integrating Global Context and Local Geometry for Few-Shot View Synthesis*
* Good, Cheap, and Fast: Overfitted Image Compression with Wasserstein Distortion
* GPAvatar: High-fidelity Head Avatars by Learning Efficient Gaussian Projections
* GPS as a Control Signal for Image Generation
* GPVK-VL: Geometry-Preserving Virtual Keyframes for Visual Localization under Large Viewpoint Changes
* Gradient Inversion Attacks on Parameter-Efficient Fine-Tuning
* Gradient-Guided Annealing for Domain Generalization
* GRAE-3DMOT: Geometry Relation-Aware Encoder for Online 3D Multi-Object Tracking
* Graph Neural Network Combining Event Stream and Periodic Aggregation for Low-Latency Event-Based Vision
* Graph Representation Learning Approach for Imbalanced Ship Type Recognition Using AIS Trajectory Data, A
* Graph-Based Approach to the Topological Optimization of Cycling Networks for the Improvement of Safety and Comfort of Cyclists, A
* Graph-Embedded Structure-Aware Perceptual Hashing for Neural Network Protection and Piracy Detection
* GraphGPT-o: Synergistic Multimodal Comprehension and Generation on Graphs
* GraphI2P: Image-to-Point Cloud Registration with Exploring Pattern of Correspondence via Graph Learning
* GraphMimic: Graph-to-Graphs Generative Modeling from Videos for Policy Learning
* grazing pressure mapping method for large-scale, complex surface scenarios: integrating deep learning and spatio-temporal characteristic of remote sensing, A
* GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding
* Gromov-Wasserstein Problem with Cyclic Symmetry
* GroomLight: Hybrid Inverse Rendering for Relightable Human Hair Appearance Modeling
* Ground-V: Teaching VLMs to Ground Complex Instructions in Pixels
* Grounding 3D Object Affordance with Language Instructions, Visual Observations and Interactions
* GroundingFace: Fine-grained Face Understanding via Pixel Grounding Multimodal Large Language Model
* GroupMamba: Efficient Group-Based Visual State Space Model
* GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skill
* GS-2DGS: Geometrically Supervised 2DGS for Reflective Object Reconstruction
* GS-DiT: Advancing Video Generation with Dynamic 3D Gaussian Fields through Efficient Dense 3D Point Tracking
* GuardSplat: Efficient and Robust Watermarking for 3D Gaussian Splatting
* Guest Editorial Introduction to the Special Issue on Cyber and Digital Information in Railway Engineering and Operation
* Guest Editorial: Introduction to the Special Section on Computational Photography
* GUI-Xplore: Empowering Generalizable GUI Agents with One Exploration
* Guiding Human-Object Interactions with Rich Geometry and Relations
* Gyro-based Neural Single Image Deblurring
* h-Edit: Effective and Flexible Diffusion-Based Editing via Doob's h-Transform
* H-MoRe: Learning Human-centric Motion Representation for Action Analysis
* H2ST: Hierarchical Two-Sample Tests for Continual Out-of-Distribution Detection
* HAF-YOLO: Dynamic Feature Aggregation Network for Object Detection in Remote-Sensing Images
* Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer
* HalLoc: Token-level Localization of Hallucinations for Vision Language Models
* Hand-held Object Reconstruction from RGB Video with Dynamic Interaction
* Handling Spatial-Temporal Data Heterogeneity for Federated Continual Learning via Tail Anchor
* HandNeRF++: Modeling Animatable Interacting Hands With Neural Radiance Fields
* HandOS: 3D Hand Reconstruction in One Stage
* Hardware-Rasterized Ray-Based Gaussian Splatting
* HarmonySet: A Comprehensive Dataset for Understanding Video-Music Semantic Alignment and Temporal Synchronization
* Harnessing Frequency Spectrum Insights for Image Copyright Protection Against Diffusion Models
* Harnessing Frozen Unimodal Encoders for Flexible Multimodal Alignment
* Harnessing Global-Local Collaborative Adversarial Perturbation for Anti-Customization
* Hash3D: Training-free Acceleration for 3D Generation
* HaWoR: World-Space Hand Motion Reconstruction from Egocentric Videos
* Hazy Low-Quality Satellite Video Restoration Via Learning Optimal Joint Degradation Patterns and Continuous-Scale Super-Resolution Reconstruction
* HD-EPIC: A Highly-Detailed Egocentric Video Dataset
* HDRoad: An encoder-decoder architecture with hybrid attention and directional prior for efficient road extraction from remote sensing images
* Hearing Anywhere in Any Environment
* Hearing Hands: Generating Sounds from Physical Interactions in 3D Scenes
* HeatFormer: A Neural Optimizer for Multiview Human Mesh Recovery
* Heavy-Haul Train Braking Simulation With Fluid Dynamics-Based Air Braking System
* HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator
* Height estimation from monocular aerial images using convolutional multi-scale and transformer coupling network (CMT)
* HELVIPAD: A Real-World Dataset for Omnidirectional Stereo Depth Estimation
* HeMoRa: Unsupervised Heuristic Consensus Sampling for Robust Point Cloud Registration
* HERA: Hybrid Explicit Representation for Ultra-Realistic Head Avatars
* Heterogeneous Skeleton-Based Action Representation Learning
* Heterophily-Aware Representation Learning on Heterogeneous Graphs
* Hiding Images in Diffusion Models by Editing Learned Score Functions
* Hierarchical Adaptive Filtering Network for Text Image Specular Highlight Removal
* Hierarchical Compact Clustering Attention (COCA) for Unsupervised Object-Centric Learning
* Hierarchical Encrypted Compression Scheme for Intra-Vehicle Network, A
* Hierarchical Features Matter: A Deep Exploration of Progressive Parameterization Method for Dataset Distillation
* Hierarchical Flow Diffusion for Efficient Frame Interpolation
* Hierarchical Gaussian Mixture Model Splatting for Efficient and Part Controllable 3D Generation
* Hierarchical Knowledge Prompt Tuning for Multi-task Test-Time Adaptation
* Hierarchical Path Planning Framework of Plant Protection UAV Based on the Improved D3QN Algorithm and Remote Sensing Image, A
* HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding
* HiFi-Portrait: Zero-shot Identity-preserved Portrait Generation with High-fidelity Multi-face Fusion
* High Dynamic Range Video Compression: A Large-Scale Benchmark Dataset and A Learned Bit-depth Scalable Compression Algorithm
* High Temporal Consistency through Semantic Similarity Propagation in Semi-Supervised Video Semantic Segmentation for Autonomous Flight
* High-fidelity 3D Object Generation from Single Image with RGBN-Volume Gaussian Reconstruction Model
* High-Fidelity Image Inpainting with Multimodal Guided GAN Inversion
* High-Fidelity Lightweight Mesh Reconstruction from Point Clouds
* High-Fidelity Relightable Monocular Portrait Animation with Lighting-Controllable Video Diffusion Model
* High-Order Multi-Scale Attention and Vertical Discriminator Enhanced CLIP for Monocular Depth Estimation
* High-Performance Real-World Optical Computing Trained by in Situ Gradient-Based Model-Free Optimization
* High-quality Point Cloud Oriented Normal Estimation via Hybrid Angular and Euclidean Distance Encoding
* High-Risk Trajectories Generation for Safety Testing of Autonomous Vehicles Based on In-Depth Crash Data
* Higher-Order Ratio Cycles for Fast and Globally Optimal Shape Matching
* HiGoReg: A Hierarchical Grouping Strategy for Point Cloud Registration
* HIIF: Hierarchical Encoding based Implicit Image Function for Continuous Super-resolution
* HiLM-D: Enhancing MLLMs with Multi-scale High-Resolution Details for Autonomous Driving
* HiLoTs: High-Low Temporal Sensitive Representation Learning for Semi-Supervised LiDAR Segmentation in Autonomous Driving
* HiMoR: Monocular Deformable Gaussian Reconstruction with Hierarchical Motion Representation
* HiPART: Hierarchical Pose AutoRegressive Transformer for Occluded 3D Human Pose Estimation
* HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models
* HistoFS: Non-IID Histopathologic Whole Slide Image Classification via Federated Style Transfer with RoI-Preserving
* History-Enhanced 3D Scene Graph Reasoning From RGB-D Sequences
* HIUFE: Hybrid intelligence-based unauthorized farmland excavation scene cognition
* HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation
* HoGS: Unified Near and Far Object Reconstruction via Homogeneous Gaussian Splatting
* HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation
* HOIGPT: Learning Long-Sequence Hand-Object Interaction with Language Models
* HOLI-1-to-3: Transient-Enhanced Holistic Image-to-3D Generation
* Holmes-VAU: Towards Long-term Video Anomaly Understanding at Any Granularity
* HomoGen: Enhanced Video Inpainting via Homography Propagation and Diffusion
* Homogeneous Dynamics Space for Heterogeneous Humans
* HOP: Heterogeneous Topology-based Multimodal Entanglement for Co-Speech Gesture Generation
* Horizon-Gs: Unified 3D Gaussian Splatting for Large-Scale Aerial-To-Ground Scenes
* HORP: Human-Object Relation Priors Guided HOI Detection
* HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos
* HOT: Hadamard-based Optimized Training
* HOTFormerLoc: Hierarchical Octree Transformer for Versatile Lidar Place Recognition Across Ground and Aerial Views
* HotSpot: Signed Distance Function Optimization with an Asymptotically Sufficient Condition
* HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
* How Do I Do That? Synthesizing 3D Hand Motion and Contacts for Everyday Interactions
* How Reliable Are the Spectral Vegetation Indices for the Assessment of Tree Condition and Mortality in European Temporal Forests?
* How to Enhance the Interpretability of Learning-Based Motion Planning for Intelligent Vehicles: A Survey
* How to Merge Your Multimodal Models Over Time?
* HRAvatar: High-Quality and Relightable Gaussian Head Avatar
* HSAA-CD: A Hierarchical Semantic Aggregation Mechanism and Attention Module for Non-Agricultural Change Detection in Cultivated Land
* HSI-GPT: A General-Purpose Large Scene-Motion-Language Model for Human Scene Interaction
* HSI: A Holistic Style Injector for Arbitrary Style Transfer
* Hubness Perspective on Representation Learning for Graph-Based Multi-View Clustering, A
* Human Activities Dominantly Driven the Greening of China During 2001 to 2020
* Human Motion Instruction Tuning
* Human-centered Interactive Learning via MLLMs for Text-to-Image Person Re-identification
* Human-Machine Shared Control for Steer-by-Wire Vehicles Using Improved Reinforcement Learning-Based MPC
* HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation
* HumanMM: Global Human Motion Recovery from Multi-shot Videos
* HumanRig: Learning Automatic Rigging for Humanoid Character in a Large Scale Dataset
* Humidify Feedback of Wetland Changes in the China Side of the Heilongjiang River Basin
* HuMoCon: Concept Discovery for Human Motion Understanding
* HUNet: Homotopy Unfolding Network for Image Compressive Sensing
* HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation
* HuPerFlow: A Comprehensive Benchmark for Human vs. Machine Motion Estimation Comparison
* HUSH: Holistic Panoramic 3D Scene Understanding using Spherical Harmonics
* HVI: A New Color Space for Low-light Image Enhancement
* Hybrid ANN-GWR Model for High-Accuracy Precipitation Estimation, A
* Hybrid Concept Bottleneck Models
* Hybrid Ensemble Learning Model Combining BERT and CNN for Predicting Urban Rail Transit Accident Consequences
* Hybrid Framework for Soil Property Estimation from Hyperspectral Imaging, A
* Hybrid GIS-Transformer Approach for Forecasting Sentinel-1 Displacement Time Series
* Hybrid Global-Local Representation with Augmented Spatial Guidance for Zero-Shot Referring Image Segmentation
* Hybrid Machine Learning Model for Hurricane Power Outage Estimation from Satellite Night Light Data
* Hybrid Reciprocal Transformer with Triplet Feature Alignment for Scene Graph Generation
* Hybrid Spatio-Temporal Graph Attention (ST D-GAT Framework) for Imputing Missing SBAS-InSAR Deformation Values to Strengthen Landslide Monitoring, A
* Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models
* HybridGS: Decoupling Transients and Statics with 2D and 3D Gaussian Splatting
* HybridMQA: Exploring Geometry-Texture Interactions for Colored Mesh Quality Assessment
* Hyperbolic Category Discovery
* Hyperbolic Safety-Aware Vision-Language Models
* Hyperbolic Uncertainty-Aware Few-Shot Incremental Point Cloud Segmentation
* Hyperdimensional Uncertainty Quantification for Multimodal Uncertainty Fusion in Autonomous Vehicles Perception
* HyperFree: A Channel-adaptive and Tuning-free Foundation Model for Hyperspectral Remote Sensing Imagery
* HyperGLM: HyperGraph for Video Scene Graph Generation and Anticipation
* Hypergraph Mamba Reasoning-Based Social Relation Recognition
* Hypergraph Vision Transformers: Images are More than Nodes, More than Edges
* HyperGS: Hyperspectral 3D Gaussian Splatting
* HyperLoRA: Parameter-Efficient Adaptive Generation for Portrait Synthesis
* HyperNet Fields: Efficiently Training Hypernetworks without Ground Truth by Learning Weight Trajectories
* HyperNVD: Accelerating Neural Video Decomposition via Hypernetworks
* HyperPose: Hypernetwork-Infused Camera Pose Localization and an Extended Cambridge Landmarks Dataset
* HyperSeg: Hybrid Segmentation Assistant with Fine-grained Visual Perceiver
* Hyperspectral Pansharpening via Diffusion Models with Iteratively Zero-Shot Guidance
* HyperspectralMamba: A Novel State Space Model Architecture for Hyperspectral Image Classification
* HyperTransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Hyperspectral Image Classification
* I2VGuard: Safeguarding Images against Misuse in Diffusion-based Image-to-Video Models
* IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments
* ICE: Intrinsic Concept Extraction from a Single Image via Diffusion Models
* IceDiff: High Resolution and High-Quality Arctic Sea Ice Forecasting with Generative Diffusion Prior
* ICP: Immediate Compensation Pruning for Mid-to-high Sparsity
* ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models
* ID-Patch: Robust ID Association for Group Photo Personalization
* IDEA-Bench: How Far are Generative Models from Professional Designing?
* IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification
* Identifying and Mitigating Position Bias of Multi-image Vision-Language Models
* Identifying and Mitigating Spurious Correlation in Multi-Task Learning
* Identifying Deformation Drivers in Dam Segments Using Combined X- and C-Band PS Time Series
* Identity-Clothing Similarity Modeling for Unsupervised Clothing Change Person Re-Identification
* Identity-preserving Distillation Sampling by Fixed-Point Iterator
* Identity-Preserving Text-To-Video Generation by Frequency Decomposition
* IDOL: Instant Photorealistic 3D Human Creation from a Single Image
* IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation
* iG-6DoF: Model-Free 6DoF Pose Estimation for Unseen Object via Iterative 3D Gaussian Splatting
* ILIAS: Instance-Level Image retrieval At Scale
* Illumination Spectrum Estimation for Multispectral Images via Surface Reflectance Modeling and Spatial-Spectral Feature Generation
* Illusion of Unlearning: The Unstable Nature of Machine Unlearning in Text-to-Image Diffusion Models, The
* IM-Portrait: Learning 3D-aware Video Diffusion for Photorealistic Talking Heads from Monocular Videos
* IM-Zero: Instance-level Motion Controllable Video Generation in a Zero-shot Manner
* Image Captions are Natural Prompts for Training Data Synthesis
* Image Denoising Using Green Channel Prior
* Image Generation Diversity Issues and How to Tame Them
* Image is All You Need to Empower Large-scale Diffusion Models for In-Domain Generation
* Image Lens Flare Removal Using Adversarial Curve Learning
* Image Over Text: Transforming Formula Recognition Evaluation with Character Detection Matching
* Image Quality Assessment: From Human to Machine Preference
* Image Quality Assessment: Investigating Causal Perceptual Effects with Abductive Counterfactual Inference
* Image Reconstruction from Readout-Multiplexed Single-Photon Detector Arrays
* Image Referenced Sketch Colorization Based on Animation Creation Workflow
* Image-like Diffusion Method for Human-Object Interaction Detection, An
* Image-to-Image Bayesian Flow Networks With Structurally Informative Priors
* Imagine and Seek: Improving Composed Image Retrieval with an Imagined Proxy
* ImagineFSL: Self-Supervised Pretraining Matters on Imagined Base Set for VLM-based Few-shot Learning
* IMFine: 3D Inpainting via Geometry-guided Multi-view Refinement
* Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models
* Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
* Impact Label Noise and Choice of Threshold has on Cross-Entropy and Soft-Dice in Image Segmentation, The
* Impact of Cloud Microphysics Initialization Using Satellite and Radar Data on CMA-MESO Forecasts
* Impact of Input Image Resolution on Deep Learning Performance for Side-Scan Sonar Classification: An Accuracy-Efficiency Analysis
* Impact of Plateau Grassland Degradation on Ecological Suitability: Revealing Degradation Mechanisms and Dividing Potential Suitable Areas with Multi Criteria Models
* Implicit Bias Injection Attacks against Text-to-Image Diffusion Models
* Implicit Correspondence Learning for Image-to-Point Cloud Registration
* Improve Representation for Imbalanced Regression through Geometric Constraints
* Improved General Five-Component Scattering Power Decomposition Method, An
* Improved InTEC Model for Estimating the Carbon Budgets in Eucalyptus Plantations, An
* Improved monocular depth prediction using distance transform over pre-semantic contours with self-supervised neural networks
* Improved Pacific Decadal Oscillation Prediction by an Optimizing Model Combined Bidirectional Long Short-Term Memory and Multiple Modal Decomposition
* Improved Photosynthetic Accumulation Models for Biomass Estimation of Soybean and Cotton Using Vegetation Indices and Canopy Height
* Improved Spiral Projection MR Fingerprinting via Memory-Efficient Synergic Optimization of 3D Spiral Trajectory, Image Reconstruction and Parameter Estimation (SOTIP)
* Improved Video VAE for Latent Video Diffusion Model
* Improving Accuracy and Calibration via Differentiated Deep Mutual Learning
* Improving Adversarial Transferability on Vision Transformers via Forward Propagation Refinement
* Improving Autoregressive Visual Generation with Cluster-Oriented Token Prediction
* Improving Diffusion Inverse Problem Solving with Decoupled Noise Annealing
* Improving Editability in Image Generation with Layer-wise Memory
* Improving Gaussian Splatting with Localized Points Management
* Improving Numerical Stability of Normalized Mutual Information Estimator on High Dimensions
* Improving Personalized Search with Regularized Low-Rank Parameter Updates
* Improving Representation of High-Frequency Components for Medical Visual Foundation Models
* Improving Semi-Supervised Semantic Segmentation with Sliced-Wasserstein Feature Alignment and Uniformity
* Improving Sound Source Localization with Joint Slot Attention on Image and Audio
* Improving the Estimates of County-Level Forest Attributes Using GEDI and Landsat-Derived Auxiliary Information in Fay-Herriot Models
* Improving the Training of Data-Efficient GANs via Quality Aware Dynamic Discriminator Rejection Sampling
* Improving the Transferability of Adversarial Attacks on Face Recognition with Diverse Parameters Augmentation
* Improving the Universal Performance of Land Cover Semantic Segmentation Through Training Data Refinement and Multi-Dataset Fusion via Redundant Models
* Improving Transferable Targeted Attacks with Feature Tuning Mixup
* Improving Visual and Downstream Performance of Low-Light Enhancer with Vision Foundation Models Collaboration
* Imputation-free and Alignment-free: Incomplete Multi-view Clustering Driven by Consensus Semantic Learning
* ImViD: Immersive Volumetric Videos for Enhanced VR Engagement
* In-Flight Calibration of Geostationary Meteorological Imagers Using Alternative Methods: MTG-I1 FCI Case Study
* in-seasonal phenology monitoring approach for wheat breeding accessions with time-series RGB imagery by using a combination KNN-CNN-RF model, A
* Incentivizing Cooperative Sensing Sharing Ecosystem for Connected and Autonomous Vehicles
* IncEventGS: Pose-Free Gaussian Splatting from a Single Event Camera
* Incomplete Multi-modal Brain Tumor Segmentation via Learnable Sorting State Space Model
* Incomplete Multi-View Multi-Label Learning via Disentangled Representation and Label Semantic Embedding
* Incorporating Dense Knowledge Alignment into Unified Multimodal Representation Models
* Incorporating Pre-Training Data Matters in Unsupervised Domain Adaptation
* Increasing the Thematic Resolution for Trees and Built Area in a Global Land Cover Dataset Using Class Probabilities
* Incremental Object Keypoint Learning
* IndoorGS: Geometric Cues Guided Gaussian Splatting for Indoor Scene Reconstruction
* Induced Polarization Imaging: A Geophysical Tool for the Identification of Unmarked Graves
* Inference-Scale Complexity in ANN-SNN Conversion for High-Performance and Low-Power Applications
* Infighting in the Dark: Multi-Label Backdoor Attack in Federated Learning
* Infinity?: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
* Influence of Viewing Geometry on Hyperspectral-Based Soil Property Retrieval, The
* Information Fusion for Secure Autonomous Drone Operations
* Information Theory-Inspired Strategy for Automated Network Pruning, An
* Information-Theoretic Analysis of Multimodal Image Translation
* INFP: Audio-Driven Interactive Head Generation in Dyadic Conversations
* InPO: Inversion Preference Optimization with Reparametrized DDIM for Efficient Diffusion Model Alignment
* InSAR Detection of Slow Ground Deformation: Taking Advantage of Sentinel-1 Time Series Length in Reducing Error Sources
* InSAR Inversion of the Source Mechanism of the 23 January 2024 Xinjiang Wushi Mw7.0 Earthquake
* Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
* InsightEdit: Towards Better Instruction Following for Image Editing
* Insightful Instance Features for 3D Instance Segmentation
* Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning
* InsTaG: Learning Personalized 3D Talking Head from Few-Second Video
* Instance-wise Supervision-level Optimization in Active Learning
* InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption
* InstanceGaussian: Appearance-Semantic Joint Gaussian Representation for 3D Instance-Level Perception
* Instant Adversarial Purification with Adversarial Consistency Distillation
* Instant Gaussian Stream: Fast and Generalizable Streaming of Dynamic Scene Reconstruction via Gaussian Splatting
* Instant3dit: Multiview Inpainting for Fast Editing of 3D Objects
* InstantGroup: Instant Template Generation for Scalable Group of Brain MRI Registration
* Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning
* Instruction-based Image Manipulation by Watching How Things Move
* Integral Fast Fourier Color Constancy
* Integrated Analysis of Satellite and Geological Data to Characterize Ground Deformation in the Area of Bologna (Northern Italy) Using a Cluster Analysis-Based Approach
* Integrated Approach for Emergency Response and Long-Term Prevention for Rainfall-Induced Landslide Clusters, An
* Integrated Remote Sensing and Near-Surface Geophysical Approach to Detect and Characterize Active and Capable Faults in the Urban Area of Florence (Italy), An
* Integrating UAV-Based RGB Imagery with Semi-Supervised Learning for Tree Species Identification in Heterogeneous Forests
* Intelligent Recognition and Parameter Estimation of Radar Active Jamming Based on Oriented Object Detection
* Intensification Trend and Mechanisms of Oman Upwelling During 1993-2018
* InterAct: Advancing Large-Scale Versatile 3D Human-Object Interaction Generation
* InteractAnything: Zero-Shot Human Object Interaction Synthesis via LLM Feedback and Object Affordance Parsing
* Interaction Confidence Attention for Human-Object Interaction Detection
* Interaction-Aware Trajectory Prediction Method Based on Sparse Spatial-Temporal Transformer for Internet of Vehicles
* InteractionMap: Improving Online Vectorized HDMap Construction with Interaction
* Interactive Medical Image Analysis with Concept-based Similarity Reasoning
* Interactive Medical Image Segmentation: A Benchmark Dataset and Baseline
* InteractVLM: 3D Interaction Reasoning from 2D Foundational Models
* InterDyn: Controllable Interactive Dynamics with Video Diffusion Models
* Interleaved-Modal Chain-of-Thought
* InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions
* Interpretable Generative Models through Post-Hoc Concept Bottlenecks
* Interpretable Image Classification via Non-parametric Part Prototype Learning
* Interpretable Machine Learning Framework for Unraveling the Dynamics of Surface Soil Moisture Drivers, An
* Interpreting Object-level Foundation Models via Visual Precision Search
* Inversion Circle Interpolation: Diffusion-based Image Augmentation for Data-scarce Classification
* Investigating the Role of Weight Decay in Enhancing Nonconvex SGD
* Investigation of Class Separability Within Object Detection Models in Histopathology
* Investigation of the Characteristics of the Mei-Yu Raindrop Size Distribution and the Limitations of Numerical Microphysical Parameterization, An
* Invisible Backdoor Attack against Self-supervised Learning
* IonoBench: Evaluating Spatiotemporal Models for Ionospheric Forecasting Under Solar-Balanced and Storm-Aware Conditions
* Ionospheric Statistical Study of the ULF Band Electric Field and Electron Density Variations Before Strong Earthquakes Based on CSES Data
* IPAD: Iterative, Parallel, and Diffusion-Based Network for Scene Text Recognition
* IRGS: Inter-Reflective Gaussian Splatting with 2D Gaussian Ray Tracing
* IRIS: Inverse Rendering of Indoor Scenes from Low Dynamic Range Images
* IRSD-Net: An Adaptive Infrared Ship Detection Network for Small Targets in Complex Maritime Environments
* Is Right Right? Enhancing Object Orientation Understanding in Multimodal Large Language Models through Egocentric Instruction Tuning
* Is this Generated Person Existed in Real-world? Fine-grained Detecting and Calibrating Abnormal Human-body
* Is Your World Simulator a Good Story Presenter? A Consecutive Events-Based Benchmark for Future Long Video Generation
* iSegMan: Interactive Segment-and-Manipulate 3D Gaussians
* Isolating Signals in Passive Non-Line-of-Sight Imaging Using Spectral Content
* It's a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data
* ITA-MDT: Image-Timestep-Adaptive Masked Diffusion Transformer Framework for Image-Based Virtual Try-On
* Iterative Predictor-Critic Code Decoding for Real-World Image Dehazing
* IterIS: Iterative Inference-Solving Alignment for LoRA Merging
* Jailbreaking the Non-Transferable Barrier via Test-Time Data Disguising
* JamMa: Ultra-lightweight Local Feature Matching with Joint Mamba
* Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
* JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation
* JarvisIR: Elevating Autonomous Driving Perception with Intelligent Image Restoration
* JiSAM: Alleviate Labeling Burden and Corner Case Problems in Autonomous Driving via Minimal Real-World Data
* Joint Luminance-Chrominance Learning for Image Debanding
* Joint Optimization Method for Power and Array of Multi-Point Sources System, A
* Joint Optimization of Neural Radiance Fields and Continuous Camera Motion from a Monocular Video
* Joint Out-of-Distribution Filtering and Data Discovery Active Learning
* Joint Scheduling of Causal Prompts and Tasks for Multi-Task Learning
* Joint Vision-Language Social Bias Removal for CLIP
* JTD-UAV: MLLM-Enhanced Joint Tracking and Description Framework for Anti-UAV Systems
* Just Dance with pi! A Poly-modal Inductor for Weakly-supervised Video Anomaly Detection
* K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs
* K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences
* KA-MIN: Knowledge-Aware Multimodal Interaction Network for Emotion Recognition in Conversation
* KAC: Kolmogorov-Arnold Classifier for Continual Learning
* KDFE: Robust KNN-Driven Fusion Estimator for LEO-SoOP Under Multi-Beam Phased-Array Dynamics
* Keep the Balance: A Parameter-Efficient Symmetrical Framework for RGB+X Semantic Segmentation
* Kernel Reformulation With Deep Constrained Least Squares for Blind Image Super-Resolution
* KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame Interpolation
* Keyframe-Guided Creative Video Inpainting
* Keystate-Driven Long-Term Generation of Bimanual Object Manipulation Sequences
* KFDNNs-Based Intelligent INS/PS Integrated Navigation Method Without Statistical Knowledge
* Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation
* KMD: Koopman Multi-modality Decomposition for Generalized Brain Tumor Segmentation under Incomplete Modalities
* Knowledge Bridger: Towards Training-Free Missing Modality Completion
* Knowledge Memorization and Rumination for Pre-trained Model-based Class-Incremental Learning
* Knowledge-Aligned Counterfactual-Enhancement Diffusion Perception for Unsupervised Cross-Domain Visual Emotion Recognition
* Knowledge-Based Strategy for Interpretation of SWIR Hyperspectral Images of Rocks, A
* Koala-36M: A Large-Scale Video Dataset Improving Consistency between Fine-Grained Conditions and Video Content
* Kolmogorov-Arnold Networks for Interpretable Crop Yield Prediction Across the U.S. Corn Belt
* KVQ: Boosting Video Quality Assessment via Saliency-guided Local Perception
* L-SWAG: Layer-Sample Wise Activation with Gradients information for Zero-Shot NAS on Vision Transformers
* Label Shift Meets Online Learning: Ensuring Consistent Adaptation with Universal Dynamic Regret
* Label-Efficient Fine-Tuning for Remote Sensing Imagery Segmentation with Diffusion Models
* LAL: Enhancing 3D Human Motion Prediction with Latency-aware Auxiliary Learning
* LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant
* Land Cover Mapping Using High-Resolution Satellite Imagery and a Comparative Machine Learning Approach to Enhance Regional Water Resource Management
* Land Surface Condition-Driven Emissivity Variation and Its Impact on Diurnal Land Surface Temperature Retrieval Uncertainty
* Landscape Heterogeneity and Transition Drive Wildfire Frequency in the Central Zone of Chile
* Landslide Susceptibility Assessment in Ya'an Based on Coupling of GWR and TabNet
* Language Guided Concept Bottleneck Models for Interpretable Continual Learning
* Language of Motion: Unifying Verbal and Non-verbal Language of 3D Human Motion, The
* Language-Assisted Debiasing and Smoothing for Foundation Model-Based Semi-Supervised Learning
* Language-Guided Audio-Visual Learning for Long-Term Sports Assessment
* Language-Guided Image Tokenization for Generation
* Language-Guided Salient Object Ranking
* Large Language Model With Region-Guided Referring and Grounding for CT Report Generation
* Large Self-Supervised Models Bridge the Gap in Domain Adaptive Object Detection
* Large-scale Multi-view Tensor Clustering with Implicit Linear Kernels
* Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator
* Laser-Induced Breakdown Spectroscopy Quantitative Analysis Using a Bayesian Optimization-Based Tunable Softplus Backpropagation Neural Network
* Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis
* Latent Space Imaging
* Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models
* LatentHOI: On the Generalizable Hand Object Motion Generation with Latent Hand Diffusion
* LaTexBlend: Scaling Multi-concept Customized Generation with Latent Textual Blending
* LATTE-MV: Learning to Anticipate Table Tennis Hits from Monocular Videos
* LaVin-DiT: Large Vision Diffusion Transformer
* Layer-and Timestep-Adaptive Differentiable Token Compression Ratios for Efficient Diffusion Transformers
* Layered Image Vectorization via Semantic Simplification
* Layered Motion Fusion: Lifting Motion Segmentation to 3D in Egocentric Videos
* LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models
* LC-Mamba: Local and Continuous Mamba with Shifted Windows for Frame Interpolation
* LeanGaussian: Breaking Pixel or Point Cloud Correspondence in Modeling 3D Gaussians
* Learnable Infinite Taylor Gaussian for Dynamic View Rendering
* Learnable Non-Uniform Quantization With Sampling-Based Optimization for Variable-Rate Learned Image Compression
* Learned Binocular-Encoding Optics for RGBD Imaging Using Joint Stereo and Focus Cues
* Learned Image Compression with Dictionary-Based Entropy Model
* Learned Spherical Image Compression With Spherical Convolution-Self-Attention and Transformer Context Model
* Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene
* Learning Affine Correspondences by Integrating Geometric Constraints
* Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval
* Learning Bijective Surface Parameterization for Inferring Signed Distance Functions from Sparse Point Clouds with Grid Deformation
* Learning Class Prototypes for Unified Sparse-Supervised 3D Object Detection
* Learning Compatible Multi-Prize Subnetworks for Asymmetric Retrieval
* Learning Comprehensive Representation via Selective Activation and Dual-Level Orthogonality for Pedestrian Attribute Recognition
* Learning Conditional Space-Time Prompt Distributions for Video Class-Incremental Learning
* Learning Discriminative Representation for Fine-Grained Object Detection in Remote Sensing Images
* Learning Distance Constrained Transformation for Video Tracking in Car-Following
* Learning Dynamic Collaborative Network for Semi-Supervised 3D Vessel Segmentation
* Learning Endogenous Attention for Incremental Object Detection
* Learning Extremely High Density Crowds as Active Matters
* Learning Flow Fields in Attention for Controllable Person Image Generation
* Learning from Neighbors: Category Extrapolation for Long-Tail Learning
* Learning from Streaming Video with Orthogonal Gradients
* Learning from Synchronization: Self-Supervised Uncalibrated Multi-View Person Association in Challenging Scenes
* Learning From Vision Foundation Models for Cross-Domain Remote Sensing Image Segmentation
* Learning Hazing to Dehazing: Towards Realistic Haze Generation for Real-World Image Dehazing
* Learning Heterogeneous Tissues with Mixture of Experts for Gigapixel Whole Slide Images
* Learning Lens Blur Fields
* Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking
* Learning on Model Weights using Tree Experts
* Learning Paradigm for Selecting Few Discriminative Stimuli in Eye-Tracking Research, A
* Learning Partonomic 3D Reconstruction from Image Collections
* Learning Person-Specific Animatable Face Models from In-the-Wild Images via a Shared Base Model
* Learning Phase Distortion with Selective State Space Models for Video Turbulence Mitigation
* Learning Physics From Video: Unsupervised Physical Parameter Estimation for Continuous Dynamical Systems
* Learning Physics-Based Full-Body Human Reaching and Grasping from Brief Walking References
* Learning Temporally Consistent Video Depth from Video Diffusion Priors
* Learning Textual Prompts for Open-World Semi-Supervised Learning
* Learning to Detect Objects from Multi-Agent LiDAR Scans without Manual Labels
* Learning to Filter Outlier Edges in Global SfM
* Learning to Highlight Audio by Watching Movies
* Learning to Normalize on the SPD Manifold under Bures-Wasserstein Geometry
* Learning to Sample Effective and Diverse Prompts for Text-to-Image Generation
* Learning Visual Composition through Improved Semantic Guidance
* Learning Visual Generative Priors without Text
* Learning with Noisy Triplet Correspondence for Composed Image Retrieval
* Learning-enabled Polynomial Lyapunov Function Synthesis via High-Accuracy Counterexample-Guided Framework
* LEDiff: Latent Exposure Diffusion for HDR Generation
* LesionLocator: Zero-Shot Universal Tumor Segmentation and Tracking in 3D Whole-Body Imaging
* Less Attention is More: Prompt Transformer for Generalized Category Discovery
* Less is More: Efficient Image Vectorization with Adaptive Parameterization
* Less is More: Efficient Model Merging with Binary Task Switch
* Lessons and Insights from a Unifying Study of Parameter-Efficient Fine-Tuning (PEFT) in Visual Recognition
* Let Humanoids Hike! Integrative Skill Development on Complex Trails
* Let Samples Speak: Mitigating Spurious Correlation by Exploiting the Clusterness of Samples
* Let's Chorus: Partner-aware Hybrid Song-Driven 3D Head Animation
* Let's Verify and Reinforce Image Generation Step by Step
* Leveragable Adaptive Multi-Scale Features and Learnable Prototypes for Imbalanced Sea State Estimation Based on Ship Motion Data
* Leveraging 3D Geometric Priors in 2D Rotation Symmetry Detection
* Leveraging Global Stereo Consistency for Category-Level Shape and 6D Pose Estimation from Stereo Images
* Leveraging High-Frequency UAV-LiDAR Surveys to Monitor Earthflow Dynamics: The Baldiola Landslide Case Study
* Leveraging Perturbation Robustness to Enhance Out-of-Distribution Detection
* Leveraging Prior Knowledge in Semi-Supervised Learning for Precise Target Recognition
* Leveraging SD Map to Augment HD Map-based Trajectory Prediction
* Leveraging Temporal Cues for Semi-Supervised Multi-View 3D Object Detection
* LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis
* LHRS-Bot-Nova: Improved multimodal large language model for remote sensing vision-language interpretation
* Libra-Merging: Importance-Redundancy and Pruning-Merging Trade-Off for Acceleration Plug-In in Large Vision-Language Model
* LibraGrad: Balancing Gradient Flow for Universally Better Vision Transformer Attributions
* LiDAR-RT: Gaussian-based Ray Tracing for Dynamic LiDAR Re-simulation
* LidarGait++: Learning Local Features and Size Awareness from LiDAR Point Clouds for 3D Gait Recognition
* Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts
* Lift3D Policy: Lifting 2D Foundation Models for Robust 3D Robotic Manipulation
* Lifting Motion to the 3D World via 2D Diffusion
* Lifting the Veil on Visual Information Flow in MLLMs: Unlocking Pathways to Faster Inference
* Light Transport-aware Diffusion Posterior Sampling for Single-View Reconstruction of 3D Volumes
* Light3R-SfM: Towards Feed-forward Structure-from-Motion
* LightLoc: Learning Outdoor LiDAR Localization at Light Speed
* Lightweight Class Incremental Semantic Segmentation Without Catastrophic Forgetting
* Lightweight UDF Learning Framework for 3D Reconstruction Based on Local Shape Functions, A
* LIM: Large Interpolator Model for Dynamic Reconstruction
* LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes
* Linear Attention Modeling for Learned Image Compression
* LineArt: A Knowledge-guided Training-free High-quality Appearance Transfer for Design Drawing with Diffusion Model
* LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
* Linguistics-aware Masked Image Modeling for Self-supervised Scene Text Recognition
* Link to the Past: Temporal Propagation for Fast 3D Human Reconstruction from Monocular Video
* Link-based Contrastive Learning for One-Shot Unsupervised Domain Adaptation
* Linkage Between Radar Reflectivity Slope and Raindrop Size Distribution in Precipitation with Bright Bands
* LinU-Mamba: Visual Mamba U-Net with Linear Attention to Predict Wildfire Spread
* LIO-GC: LiDAR Inertial Odometry with Adaptive Ground Constraints
* LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant
* LIRM: Large Inverse Rendering Model for Progressive Reconstruction of Shape, Materials and View-dependent Radiance Fields
* LiSu: A Dataset and Method for LiDAR Surface Normal Estimation
* LITA-GS: Illumination-Agnostic Novel View Synthesis via Reference-Free 3D Gaussian Splatting and Physical Priors
* Live: Learning Video LLM with Streaming Speech Transcription at Scale
* LiVOS: Light Video Object Segmentation with Gated Linear Matching
* LLaFS++: Few-Shot Image Segmentation with Large Language Models
* LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding
* LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living
* LLLaVA-Critic: Learning to Evaluate Multimodal Models
* LLM-driven Multimodal and Multi-Identity Listening Head Generation
* LLM-Guided Decoupled Probabilistic Prompt for Continual Learning in Medical Image Diagnosis
* LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models
* LMO: Linear Mamba Operator for MRI Reconstruction
* Local Concept Embeddings for Analysis of Concept Distributions in Vision DNN Feature Spaces
* Local Information-Driven Hierarchical Fusion of SAR and Visible Images via Refined Modal Salient Features
* Locality-Aware Zero-Shot Human-Object Interaction Detection
* Localization of Multiple GNSS Interference Sources Based on Target Detection in C/N0 Distribution Maps
* Localized Concept Erasure for Text-to-Image Diffusion Models Using Training-Free Gated Low-Rank Adaptation
* Localizing Events in Videos with Multimodal Queries
* Locally Orderless Images for Optimization in Differentiable Rendering
* LOCAT: Localization-Driven Text Watermarking via Large Language Models
* LoCoRe: Image Re-Ranking with Long-Context Sequence Modeling
* LOD-GS: Achieving Levels of Detail using Scalable Gaussian Soup
* LogiCzsl: Exploring Logic-induced Representation for Compositional Zero-shot Learning
* Logits DeConfusion with CLIP for Few-Shot Learning
* LogoSP: Local-global Grouping of Superpoints for Unsupervised Semantic Segmentation of 3D Point Clouds
* LoKi: Low-dimensional KAN for Efficient Fine-tuning Image Models
* Long Short-Term Fusion by Multi-Scale Distillation for Screen Content Video Quality Enhancement
* Long Short-Term Knowledge Decomposition and Consolidation for Lifelong Person Re-Identification
* Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation
* Long-Term Hourly Ozone Forecasting via Time-Frequency Analysis of ICEEMDAN-Decomposed Components: A 36-Hour Forecast for a Site in Beijing
* Long-Term Snow Cover Change in the Qilian Mountains (1986-2024): A High-Resolution Landsat-Based Analysis
* Long-Term Surface Deformation Monitoring and Prediction of Hutubi Gas Storage Reservoir in Xinjiang Based on InSAR and the GWO-VMD-GRU Model, The
* LongDiff: Training-Free Long Video Generation in One Go
* LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos
* LookCloser: Frequency-aware Radiance Field for Tiny-Detail Scene
* LookingGlass: Generative Anamorphoses via Laplacian Pyramid Warping
* LoRA Recycle: Unlocking Tuning-Free Few-Shot Adaptability in Visual Foundation Models by Recycling Pre-Tuned LoRAs
* LoRA Subtraction for Drift-Resistant Space in Exemplar-Free Continual Learning
* LoRACLR: Contrastive Adaptation for Customization of Diffusion Models
* LoRASculpt: Sculpting LoRA for Harmonizing General and Specialized Knowledge in Multimodal Large Language Models
* Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues
* LoTUS: Large-Scale Machine Unlearning with a Taste of Uncertainty
* LotusFilter: Fast Diverse Nearest Neighbor Search via a Learned Cutoff Table
* Low-Biased General Annotated Dataset Generation
* Low-Rank Adaptation in Multilinear Operator Networks for Security-Preserving Incremental Learning
* Low-Resolution Self-Attention for Semantic Segmentation
* LP-Diff: Towards Improved Restoration of Real-World Degraded License Plate
* LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation
* LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences
* LSNet: See Large, Focus Small
* LSTMConvSR: Joint Long-Short-Range Modeling via LSTM-First-CNN-Next Architecture for Remote Sensing Image Super-Resolution
* LT3SD: Latent Trees for 3D Scene Diffusion
* LUCAS: Layered Universal Codec Avatars
* Luminance-GS: Adapting 3D Gaussian Splatting to Challenging Lighting Conditions with View-Adaptive Curve Adjustment
* LumiNet: Latent Intrinsics Meets Diffusion Models for Indoor Scene Relighting
* Lung Cancer Screening Classification by Sequential Multi-Instance Learning (SMILE) Framework With Multiple CT Scans
* Lux Post Facto: Learning Portrait Performance Relighting with Conditional Video Diffusion and a Hybrid Dataset
* LWSARDet: A Lightweight SAR Small Ship Target Detection Network Based on a Position-Morphology Matching Mechanism
* Lyapunov-Based Adaptive Neural Network Optimized Backstepping Control of Uncertain Unmanned Fire Fighting Robot
* M-LLM Based Video Frame Selection for Efficient Video Understanding
* M3-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation
* M3amba: Memory Mamba is All You Need for Whole Slide Image Classification
* M3GYM: A Large-Scale Multimodal lMulti-view Multi-person Pose Dataset for Fitness Activity Understanding in Real-world Settings
* MAC-Ego3D: Multi-Agent Gaussian Consensus for Real-Time Collaborative Ego-Motion and Photorealistic 3D Reconstruction
* Machine Learning-Based Method for Lithology Identification of Outcrops Using TLS-Derived Spectral and Geometric Features, A
* Machine learning-based retrieval of chlorophyll-a and total suspended matter from HY-3A CZI: Model development, validation, and application
* MAD: Memory-Augmented Detection of 3D Objects
* MaDCoW: Marginal Distortion Correction for Wide-Angle Photography with Arbitrary Objects
* MAGE: Single Image to Material-Aware 3D via the Multi-View G-Buffer Estimation Model
* MAGiC-SLAM: Multi-Agent Gaussian Globally Consistent SLAM
* MagicArticulate: Make Your 3D Models Articulation-Ready
* MagicQuill: An Intelligent Interactive Image Editing System
* MagicTime: Time-Lapse Video Generation Models as Metamorphic Simulators
* Magma: A Foundation Model for Multimodal AI Agents
* Magnetopause Boundary Detection Based on a Deep Image Prior Model Using Simulated Lobster-Eye Soft X-Ray Images
* Maintaining Consistent Inter-Class Topology in Continual Test-Time Adaptation
* MaIR: A Locality-and Continuity-Preserving Mamba for Image Restoration
* Make It Count: Text-to-Image Generation with an Accurate Number of Objects
* Make-It-Animatable: An Efficient Framework for Authoring Animation-Ready 3D Characters
* Making Old Film Great Again: Degradation-aware State Space Model for Old Film Restoration
* Mamba as a Bridge: Where Vision Foundation Models Meet Vision Language Models for Domain-Generalized Semantic Segmentation
* Mamba-Adaptor: State Space Model Adaptor for Visual Recognition
* Mamba-Reg: Vision Mamba Also Needs Registers
* Mamba4D: Efficient 4D Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space Models
* MambaIC: State Space Models for High-Performance Learned Image Compression
* MambaIRv2: Attentive State Space Restoration
* MambaOut: Do We Really Need Mamba for Vision?*
* MambaVision: A Hybrid Mamba-Transformer Vision Backbone
* MambaVLT: Time-Evolving Multimodal State Space Model for Vision-Language Tracking
* MambaVO: Deep Visual Odometry Based on Sequential Matching Refinement and Training Smoothing
* MammAlps: A multi-view video behavior monitoring dataset of wild mammals in the Swiss Alps
* MangaNinja: Line Art Colorization with Precise Reference Following
* Mani-GS: Gaussian Splatting Manipulation with Triangular Mesh
* ManipTrans: Efficient Dexterous Bimanual Manipulation Transfer via Residual Learning
* ManiVideo: Generating Hand-Object Manipulation Video with Dexterous and Generalizable Grasping
* MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects
* MANTA: Diffusion Mamba for Efficient and Effective Stochastic Long-Term Dense Action Anticipation
* Many-Objective Optimization of Railway Alignments With Strengthened Pareto Dominance Analysis
* MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining
* Mapping dynamics of large-scale high-precision pond datasets using a semi-automated method based on deep learning
* Mapping Subtidal Marine Forests in the Mediterranean Sea Using Copernicus Contributing Mission
* Mapping Waterbird Habitats with UAV-Derived 2D Orthomosaic Along Belgium's Lieve Canal
* Mapping Wetlands with High-Resolution Planet SuperDove Satellite Imagery: An Assessment of Machine Learning Models Across the Diverse Waterscapes of New Zealand
* MAR-3D: Progressive Masked Auto-regressor for High-Resolution 3D Generation
* MARBLE: Material Recomposition and Blending in CLIP-Space
* MaRI: Material Retrieval Integration across Domains
* Marine Heatwaves and Cold Spells Accompanied by Mesoscale Eddies Globally
* MarkushGrapher: Joint Visual and Textual Recognition of Markush Structures
* Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding
* Martian Skylight Identification Based on the Deep Learning Model
* MARVEL-40M+: Multi-Level Visual Elaboration for High-Fidelity Text-to-3D Content Creation
* MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal Representations
* Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation
* Mask2DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation
* Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding
* Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding
* MaskGaussian: Adaptive 3D Gaussian Representation from Probabilistic Masks
* MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction
* Masking meets Supervision: A Strong Learning Alliance
* MaSS13K: A Matting-level Semantic Segmentation Benchmark
* MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors
* MAT-MS: A mask-aware transformer for constructing gap-free MODIS normalized difference snow index products
* MatAnyone: Stable Video Matting with Consistent Memory Propagation
* MAtCha Gaussians: Atlas of Charts for High-Quality Geometry and Photorealism From Sparse Views
* MATCHA: Towards Matching Anything
* Material Anything: Generating Materials for Any 3D Object via Diffusion
* Matrix-Free Shared Intrinsics Bundle Adjustment
* Matrix3D: Large Photogrammetry Model All-in-One
* Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception
* Maximum Likelihood for Logistic Regression Model With Incomplete and Hybrid-Type Covariates
* MBQ: Modality-Balanced Quantization for Large Vision-Language Models
* MC2: Multi-concept Guidance for Customized Multi-concept Generation
* MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation
* MCFNet: Multiscale Cross-Modal Fusion Network for Remote Sensing Image Semantic Segmentation
* MDP: Multidimensional Vision Model Pruning with Latency Constraint
* Measurement of Suspended Sediment Concentration at the Outlet of the Yellow River Canyon: Using Sentinel-2 Images and Machine Learning
* MEAT: Multiview Diffusion Model for Human Generation on Megapixels with Mesh Attention
* Mechanisms of Ocean Acidification in Massachusetts Bay: Insights from Modeling and Observations
* MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual Representations
* Medusa: A Multi-Scale High-order Contrastive Dual-Diffusion Approach for Multi-View Clustering
* MEET: Towards Memory-Efficient Temporal Sparse Deep Neural Networks
* MEFA-Net: Multilevel Feature Extraction and Fusion Attention Network for Infrared Small-Target Detection
* MeGA: Hybrid Mesh-Gaussian Head Avatar for High-Fidelity Rendering and Head Editing
* MEGA: Masked Generative Autoencoder for Human Mesh Recovery
* MegaSaM: Accurate, Fast, and Robust Structure and Motion from Casual Dynamic Videos
* MegaSynth: Scaling Up 3D Scene Reconstruction with Synthesized Data
* Memories of Forgotten Concepts
* MERGE: Multi-faceted Hierarchical Graph-based GNN for Gene Expression Prediction from Whole Slide Histopathology Images
* MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
* MESC-3D:Mining Effective Semantic Cues for 3D Reconstruction from a Single Image
* Mesh Mamba: A Unified State Space Model for Saliency Prediction in Non-Textured and Textured Meshes
* Mesh-Aligned 3D Gaussian Splatting for Multi-Resolution Anti-Aliasing Rendering
* MeshArt: Generating Articulated Meshes with Structure-Guided Transformers
* MeshGen: Generating PBR Textured Mesh with Render-Enhanced Auto-Encoder and Generative Data Augmentation
* Mesoscale Analysis and Numerical Simulation of an Extreme Precipitation Event on the Northern Slope of the Middle Kunlun Mountains in Xinjiang, China
* MEt3R: Measuring Multi-View Consistency in Generated Images
* Meta-Learning Hyperparameters for Parameter Efficient Fine-Tuning
* MetaScenes: Towards Automated Replica Creation for Real-world 3D Scans
* MetaShadow: Object-Centered Shadow Detection, Removal, and Synthesis
* MetaWriter: Personalized Handwritten Text Recognition Using Meta-Learned Prompt Tuning
* Meteorological Drivers and Agricultural Drought Diagnosis Based on Surface Information and Precipitation from Satellite Observations in Nusa Tenggara Islands, Indonesia
* Method for Auto Generating a Remote Sensing Building Detection Sample Dataset Based on OpenStreetMap and Bing Maps, A
* Method of Simplified Synthetic Objects Creation for Detection of Underwater Objects from Remote Sensing Data Using YOLO Networks, A
* Method of Time Alignment in BEV Features for Multimodal Fusion Object Detection of Intelligent Vehicles, A
* Methodology for Evaluating Collision Avoidance Maneuvers Using Aerodynamic Control
* MetricGrids: Arbitrary Nonlinear Approximation with Elementary Metric Grids based Implicit Neural Representation
* MExD: An Expert-Infused Diffusion Model for Whole-Slide Image Classification
* MFDB-Net: Multi-Attention Fusion Dual-Branch Network for Pavement Crack Detection
* MFogHub: Bridging Multi-Regional and Multi-Satellite Data for Global Marine Fog Detection and Forecasting
* MG-MotionLLM: A Unified Framework for Motion Comprehension and Generation across Multiple Granularities
* MI-DETR: An Object Detection Model with Multi-time Inquiries Mechanism
* MICAS: Multi-grained In-Context Adaptive Sampling for 3D Point Cloud Processing
* MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research
* MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation
* Mimic In-Context Learning for Multimodal Tasks
* Mimir: Improving Video Diffusion Models for Precise Text Understanding
* MIMO: A medical vision language model with visual referring multimodal input and pixel grounding multimodal output
* MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling
* Mind the Gap: Confidence Discrepancy Can Guide Federated Semi-Supervised Learning Across Pseudo-Mismatch
* Mind the Gap: Detecting Black-box Adversarial Attacks in the Making through Query Update Analysis
* Mind the Time: Temporally-Controlled Multi-Event Video Generation
* Mind the Trojan Horse: Image Prompt Adapter Enabling Scalable and Deceptive Jailbreaking
* Minding Fuzzy Regions: A Data-driven Alternating Learning Paradigm for Stable Lesion Segmentation
* Minding Spatial Allocation Entropy: Sentinel-2 Dense Time Series Spectral Features Outperform Vegetation Indices to Map Desert Plant Assemblages
* MINIMA: Modality Invariant Image Matching
* Minimal Interaction Separated Tuning: A New Paradigm for Visual Adaptation
* Minimal Solution for Binocular Camera Relative Pose Estimation Based on the Gravity Prior, A
* Minimizing Labeled, Maximizing Unlabeled: An Image-Driven Approach for Video Instance Segmentation
* mining scene understanding framework with limited labeled samples jointly driven by object-level spatial relationships and multi-task network, A
* Minority-Focused Text-to-Image Generation via Prompt Optimization
* MIRE: Matched Implicit Neural Representations
* MirrorVerse: Pushing Diffusion Models to Realistically Reflect the World
* Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval
* Mitigating Ambiguities in 3D Classification with Gaussian Splatting*
* Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key
* Mitigating NDVI saturation in imagery of dense and healthy vegetation
* Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
* Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation
* MITracker: Multi-View Integration for Visual Object Tracking
* MixerMDM: Learnable Composition of Human Motion Diffusion Models
* Mixture of Submodules for Domain Adaptive Person Search
* MixtureRS: A Mixture of Expert Network Based Remote Sensing Land Classification
* MLLM-as-a-Judge for Image Safety without Human Labeling
* MLVU: Benchmarking Multi-task Long Video Understanding
* MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments
* MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling
* MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
* MMformer: Transformer-Based Trajectory Map-Matching Model for Large-Scale Road Networks
* MMRL: Multi-Modal Representation Learning for Vision-Language Models
* MMTL-UniAD: A Unified Framework for Multimodal and Multi-Task Learning in Assistive Driving Perception
* MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
* MNE-SLAM: Multi-Agent Neural SLAM for Mobile Robots
* MobileH2R: Learning Generalizable Human to Mobile Robot Handover Exclusively from Scalable and Diverse Synthetic Data
* MobileMamba: Lightweight Multi-Receptive Visual Mamba Network
* MobilePortrait: Real-Time One-Shot Neural Head Avatars on Mobile Devices
* MoBluRF: Motion Deblurring Neural Radiance Fields for Blurry Monocular Video
* MODA: Motion-Drift Augmentation for Inertial Human Motion Analysis
* MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval Adjustment for Compact Dynamic 3D Gaussian Splatting
* Model Diagnosis and Correction via Linguistic and Implicit Attribute Editing
* Model Poisoning Attacks to Federated Learning via Multi-Round Consistency
* Modeling and Control of PRP-Gantry Crane Systems via Neural IDA-PBC
* Modeling Multiple Normal Action Representations for Error Detection in Procedural Tasks
* Modeling Multivariable Associations and Inter-Eddy Interactions: A Dual-Graph Learning Framework for Mesoscale Eddy Trajectory Forecasting
* Modeling the Distribution and Richness of Mammalian Species in the Nyerere National Park, Tanzania
* Modeling Thousands of Human Annotators for Generalizable Text-to-Image Person Re-identification
* ModeSeq: Taming Sparse Multimodal Motion Prediction with Sequential Mode Modeling
* MODfinity: Unsupervised Domain Adaptation with Multimodal Information Flow Intertwining
* MoEdit: On Learning Quantity Perception for Multi-object Image Editing
* MoEE: Mixture of Emotion Experts for Audio-Driven Portrait Animation
* MoFlow: One-Step Flow Matching for Human Trajectory Forecasting via Implicit Maximum Likelihood Estimation based Distillation
* MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision
* MoHiPr-TB: A Monthly Gridded Multi-Source Merged Precipitation Dataset for the Tarim Basin Based on Machine Learning
* Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
* MoManipVLA: Transferring Vision-language-action Models for General Mobile Manipulation
* Monitoring Critical Mountain Vertical Zonation in the Surkhan River Basin Based on a Comparative Analysis of Multi-Source Remote Sensing Features
* Monitoring Irish Coastal Heritage Destruction: A Case Study from Inishark, Co. Galway, Ireland
* Monitoring Knee Health: Ultra-Wideband Radar Imaging for Early Detection of Osteoarthritis
* Monitoring Nitrogen Uptake and Grain Quality in Ponded and Aerobic Rice with the Squared Simplified Canopy Chlorophyll Content Index
* Monitoring the Early Growth of Pinus and Eucalyptus Plantations Using a Planet NICFI-Based Canopy Height Model: A Case Study in Riqueza, Brazil
* Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
* Mono2Stereo: A Benchmark and Empirical Study for Stereo Conversion
* Mono3DVLT: Monocular-Video-Based 3D Visual Language Tracking
* Monocular and Generalizable Gaussian Talking Head Animation
* Monocular Visual Pose Measurement for Autonomous Landing in Unknown Environments
* MonoDGP: Monocular 3D Object Detection with Decoupled-Query and Geometry-Error Priors
* MonoInstance: Enhancing Monocular Priors via Multi-view Instance Alignment for Neural Rendering and Reconstruction
* MonoPlace3D: Learning 3D-Aware Object Placement for 3D Monocular Detection
* MonoSplat: Generalizable 3D Gaussian Splatting from Monocular Depth Foundation Models
* MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection
* MonSter: Marry Monodepth to Stereo Unleashes Power
* Monthly Urban Electricity Power Consumption Prediction Using Nighttime Light Remote Sensing: A Case Study of the Yangtze River Delta Urban Agglomeration
* Mooring Observations of Typhoon Trami (2024)-Induced Upper-Ocean Variability: Diapycnal Mixing and Internal Wave Energy Characteristics
* Morpheus: Text-Driven 3D Gaussian Splat Shape and Color Stylization
* MOS-Attack: A Scalable Multi-Objective Adversarial Attack Framework
* MOS: Modeling Object-Scene Associations in Generalized Category Discovery
* Mosaic of Modalities: A Comprehensive Benchmark for Multimodal Graph Learning
* Mosaic3D: Foundation Dataset and Model for Open-Vocabulary 3D Segmentation
* MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds
* Most Exigent Eigenvalue Problem for the Stability Analysis of the Vehicle Platoon With Delay
* MoST: Efficient Monarch Sparse Tuning for 3D Representation Learning
* MotiF: Making Text Count in Image Animation with Motion Focal Loss
* Motion Modes: What Could Happen Next?
* Motion Prompting: Controlling Video Generation with Motion Trajectories
* Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
* MotionBench: Benchmarking and Improving Fine-Grained Video Motion Understanding for Vision Language Models
* MotionMap: Representing Multimodality in Human Pose Forecasting
* MotionPro: A Precise Motion Controller for Image-to-Video Generation*
* MotionPRO: Exploring the Role of Pressure in Human MoCap and Beyond
* Motions as Queries: One-Stage Multi-Person Holistic Human Motion Capture
* MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation
* Move-in-2D: 2D-Conditioned Human Motion Generation
* MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders
* Movie Weaver: Tuning-Free Multi-Concept Video Personalization with Anchored Prompts
* MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation
* MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes
* MOWA: Multiple-in-One Image Warping Model
* MP-GUI: Modality Perception with MLLMs for GUI Understanding
* MP-SfM: Monocular Surface Priors for Robust Structure-From-Motion
* MPDrive: Improving Spatial Understanding with Marker-Based Prompt Learning for Autonomous Driving
* Mr. DETR: Instructive Multi-Route Training for Detection Transformers
* MSDP-Net: A Multi-Scale Domain Perception Network for HRRP Target Recognition
* MSJosSAR Configuration Optimization and Scattering Mechanism Classification Based on Multi-Dimensional Features of Attribute Scattering Centers
* MST: Adaptive Multi-Scale Tokens Guided Interactive Segmentation
* MTADiffusion: Mask Text Alignment Diffusion Model for Object Inpainting
* MTL-PlotCounter: Multitask Driven Soybean Seedling Counting at the Plot Scale Based on UAV Imagery
* Multi-Band Differential SAR Interferometry for Snow Water Equivalent Retrieval over Alpine Mountains
* Multi-Channel Coupled Variational Bayesian Framework with Structured Sparse Priors for High-Resolution Imaging of Complex Maneuvering Targets
* Multi-Channel Disentangled Graph Neural Networks With Different Types of Self-Constraints
* Multi-Faceted Adaptive Token Pruning for Efficient Remote Sensing Image Segmentation
* Multi-Feature Fusion Approach for Sea Fog Detection Under Complex Background, A
* Multi-focal Conditioned Latent Diffusion for Person Image Synthesis
* Multi-Granularity Class Prototype Topology Distillation for Class-Incremental Source-Free Unsupervised Domain Adaptation
* Multi-Group Proportional Representation for Text-to-Image Models
* Multi-Head Graph Attention Adversarial Autoencoder Network for Unsupervised Change Detection Using Heterogeneous Remote Sensing Images
* Multi-Label Learning With Multiple Complementary Labels
* Multi-Label Prototype Visual Spatial Search for Weakly Supervised Semantic Segmentation
* Multi-Layer Task Offloading Scheme in Fog Computing-Based VANETs With Optimized Completion Delay
* Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices
* Multi-Level Contextual Prototype Modulation for Compositional Zero-Shot Learning
* Multi-Level Guided Discrepancy Learning for Source-Free Object Detection in Hazy Conditions
* Multi-Level Semi-Automatic Procedure for the Monitoring of Bridges in Road Infrastructure Using MT-DInSAR Data, A
* Multi-Modal Aerial-Ground Cross-View Place Recognition with Neural ODEs
* Multi-modal Contrastive Learning with Negative Sampling Calibration for Phenotypic Drug Discovery
* Multi-Modal Contrastive Masked Autoencoders: A Two-Stage Progressive Pre-training Approach for RGBD Datasets
* Multi-modal Knowledge Distillation-based Human Trajectory Forecasting
* Multi-Modal Long-Short Distance Attention-Based Transformer-GAN for PET Reconstruction With Auxiliary MRI
* Multi-Modal Medical Diagnosis via Large-Small Model Collaboration
* Multi-Modal Synergistic Implicit Image Enhancement for Efficient Optical Flow Estimation
* Multi-modal Topology-embedded Graph Learning for Spatially Resolved Genes Prediction from Pathology Images with Prior Gene Similarity Information
* Multi-modal Vision Pre-training for Medical Image Analysis
* Multi-party Collaborative Attention Control for Image Customization
* Multi-Receiver GNSS System Geometry Control Algorithm in Mobile Measurement of Railway Track Axis Position, A
* Multi-Resolution Pathology-Language Pre-training Model with Text-Guided Visual Representation
* Multi-Scale Feature Extraction with 3D Complex-Valued Network for PolSAR Image Classification
* Multi-Scale Neighborhood Occupancy Masked Autoencoder for Self-Supervised Learning in LiDAR Point Clouds
* Multi-Sensor Flood Mapping in Urban and Agricultural Landscapes of the Netherlands Using SAR and Optical Data with Random Forest Classifier
* Multi-Sensor Object Anomaly Detection: Unifying Appearance, Geometry, and Internal Properties
* Multi-Source Domain Generalization for Learned Lossless Volumetric Biomedical Image Compression
* Multi-subject Open-set Personalization in Video Generation
* Multi-Task Learning Framework with Enhanced Cross-Level Semantic Consistency for Multi-Level Land Cover Classification, A
* Multi-Temporal Mineral Mapping in Two Torrential Basins Using PRISMA Hyperspectral Imagery
* Multi-View Disparity Estimation Using the Gradient Consistency Model
* Multi-View Pose-Agnostic Change Localization with Zero Labels
* Multi-view Reconstruction via SfM-guided Monocular Depth Estimation
* Multidimensional Identification of County-Level Shrinkage by Improved Mapping of Urban Entities Based on Time-Series Remote Sensing Data: A Case Study of Yangtze River Delta Urban Agglomerations
* Multidimensional Parameter Dynamic Evolution-Based Airdrop Target Prediction Method Driven by Multiple Models, A
* MultiGO: Towards Multi-level Geometry Learning for Monocular 3D Textured Human Reconstruction
* Multilevel Multimodal Hybrid Mamba-Large Strip Convolution Network for Remote Sensing Semantic Segmentation, A
* Multimodal Autoregressive Pre-training of Large Vision Encoders
* Multimodal Distribution of Positioning Errors in NRTK GNSS CORSs: A Case Study in Sicily (Italy)
* MultimodalStudio: A Heterogeneous Sensor Dataset and Framework for Neural Rendering across Multiple Imaging Modalities
* MultiMorph: On-Demand Atlas Construction
* Multiple Object Tracking as ID Prediction
* Multirate Neural Image Compression with Adaptive Lattice Vector Quantization
* Multitwine: Multi-Object Compositing with Text and Layout Control
* MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval
* MUSt3R: Multi-view Network for Stereo 3D Reconstruction
* MUST: The First Dataset and Unified Framework for Multispectral UAV Single Object Tracking
* MuTri: Multi-view Tri-alignment for OCT to OCTA 3D Image Translation
* MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds
* Mv-Math: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts
* MV-SSM: Multi-View State Space Modeling for 3D Human Pose Estimation
* MVBoost: Boost 3D Reconstruction with Multi-View Refinement
* MVDoppler-Pose: Multi-Modal Multi-View mmWave Sensing for Long-Distance Self-Occluded Human Walking Pose Estimation
* MVGenMaster: Scaling Multi-View Generation from Any Image via 3D Priors Enhanced Diffusion Model
* MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D
* MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation
* MVSAnywhere: Zero-Shot Multi-View Stereo
* M³amba: CLIP-Driven Mamba Model for Multi-Modal Remote Sensing Classification
* NADER: Neural Architecture Design via Multi-Agent Collaboration
* NADM: Noise-Aware Diffusion Model for Landscape Painting Video Generation
* Narrating the Video: Boosting Text-Video Retrieval via Comprehensive Utilization of Frame-Level Captions
* Navigating Image Restoration with VAR's Distribution Alignment Prior
* Navigating the Unseen: Zero-shot Scene Graph Generation via Capsule-Based Equivariant Features
* Navigation World Models
* Nearly Zero-Cost Protection Against Mimicry by Personalized Diffusion Models
* NeighborRetr: Balancing Hub Centrality in Cross-Modal Retrieval
* NeISF++: Neural Incident Stokes Field for Polarized Inverse Rendering of Conductors and Dielectrics
* NeMF: Neural Microphysics Fields
* NeRFPrior: Learning Neural Radiance Field as a Prior for Indoor Scene Reconstruction
* Nested Diffusion Models Using Hierarchical Latent Priors
* Neural Architecture Search for Hyperspectral Image Classification: A Comprehensive Review and Future Perspectives
* Neural Defocus Light Field Rendering
* Neural Hierarchical Decomposition for Single Image Plant Modeling
* Neural Inverse Rendering from Propagating Light
* Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion
* Neural Motion Simulator Pushing the Limit of World Models in Reinforcement Learning
* Neural Video Compression with Context Modulation
* Neuro-3D: Towards 3D Visual Decoding from EEG Signals
* Neuro-Symbolic Evaluation of Text-to-Video Models using Formal Verification
* Neuron: Learning Context-Aware Evolving Representations for Zero-Shot Skeleton Action Recognition
* New Accelerated Off-Policy Stochastic Preconditioned TD(0) Algorithm, A
* New Signal Separation and Sampling Duration Estimation Method for ISRJ Based on FRFT and Hybrid Modality Fusion Network, A
* New Statistical Model of Star Speckles for Learning to Detect and Characterize Exoplanets in Direct Imaging Observations, A
* NexusGS: Sparse View Synthesis with Epipolar Depth Priors in 3D Gaussian Splatting
* NightAdapter: Learning a Frequency Adapter for Generalizable Night-time Scene Segmentation
* NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training
* NLPrompt: Noise-Label Prompt Learning for Vision-Language Models
* NN-Former: Rethinking Graph Structure in Neural Architecture Representation
* nnWNet: Rethinking the Use of Transformers in Biomedical Image Segmentation and Calling for a Unified Evaluation Benchmark
* No Pains, More Gains: Recycling Sub-Salient Patches for Efficient High-Resolution Image Recognition
* No Thing, Nothing: Highlighting Safety-Critical Classes for Robust LiDAR Semantic Segmentation in Adverse Weather
* Noise Calibration and Spatial-Frequency Interactive Network for STEM Image Enhancement
* Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image Synthesis
* Noise Mitigation of the SMOS L1C Multi-Angle Brightness Temperature Based on the Lookup Table
* Noise Modeling in One Hour: Minimizing Preparation Efforts for Self-supervised Low-Light RAW Image Denoising
* Noise-Consistent Siamese-Diffusion for Medical Image Synthesis and Segmentation
* Noise-Resistant Video Anomaly Detection via RGB Error-Guided Multiscale Predictive Coding and Dynamic Memory
* NoiseCtrl: A Sampling-Algorithm-Agnostic Conditional Generation Method for Diffusion Models
* Non-Exemplar Class-Incremental Learning via Prototype Correction and Hierarchical Regularization for Specific Emitter Identification
* Non-Invasive Estimation of Crop Water Stress Index and Irrigation Management with Upscaling from Field to Regional Level Using Remote Sensing and Agrometeorological Data
* Non-Natural Image Understanding with Advancing Frequency-based Vision Encoders
* Non-Rigid Point Cloud Registration via Anisotropic Hybrid Field Harmonization
* Nonisotropic Gaussian Diffusion for Realistic 3D Human Motion Prediction
* NoPain: No-box Point Cloud Attack via Optimal Transport Singular Boundary
* Not All Parameters Matter: Masking Diffusion Models for Enhancing Generation Ability
* Not Just Text: Uncovering Vision Modality Typographic Threats in Image Generation Models
* Not Only Text: Exploring Compositionality of Visual Representations in Vision-Language Models
* NoT: Federated Unlearning via Weight Negation
* Notes-guided MLLM Reasoning: Enhancing MLLM with Knowledge and Visual Notes for Visual Question Answering
* Novel Dynamic Context Branch Attention Network for Detecting Small Objects in Remote Sensing Images, A
* Novel Framework for Assessing Urban Green Space Equity Integrating Accessibility and Diversity: A Shenzhen Case Study, A
* Novel Framework for Identifying Driving Heterogeneity Through Action Patterns, A
* Novel Method for Single-Station Lightning Distance Estimation Based on the Physical Time Reversal, A
* Novel View Synthesis with Pixel-Space Diffusion Models
* NSD-Imagery: A benchmark dataset for extending fMRI vision decoding methods to mental imagery
* NTClick: Achieving Precise Interactive Segmentation With Noise-tolerant Clicks
* NTR-Gaussian: Nighttime Dynamic Thermal Reconstruction with 4D Gaussian Splatting Based on Thermodynamics
* NU-AIR: A Neuromorphic Urban Aerial Dataset for Detection and Localization of Pedestrians and Vehicles
* Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection
* Number it: Temporal Grounding Videos like Flipping Manga
* NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images
* NVILA: Efficient Frontier Visual Language Models
* O-TPT: Orthogonality Constraints for Calibrating Test-time Prompt Tuning in Vision-Language Models
* OAPR: An Offset-Aware Progressive Regression Object Detector
* Object Detection Using Event Camera: A MoE Heat Conduction Based Detector and A New Benchmark Dataset
* Object-aware Sound Source Localization via Audio-Visual Scene Understanding
* Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation
* Object-Shot Enhanced Grounding Network for Egocentric Video
* ObjectMover: Generative Object Movement with Video Prior
* Obtaining the Highest Quality from a Low-Cost Mobile Scanner: A Comparison of Several Pipelines with a New Scanning Device
* Occlusion-Aware Text-Image-Point Cloud Pretraining for Open-World 3D Object Recognition
* Occlusion-Aware Trajectory Planning With Quantified Risk Constraint for Deadlock Mitigation in Autonomous Driving
* OccMamba: Semantic Occupancy Prediction with State Space Models
* OCRT: Boosting Foundation Models in the Open World with Object-Concept-Relation Triad
* Octopus: Alleviating Hallucination via Dynamic Contrastive Decoding
* ODA-GAN: Orthogonal Decoupling Alignment GAN Assisted by Weakly-supervised Learning for Virtual Immunohistochemistry Staining
* Odd-One-Out: Anomaly Detection by Comparing with Neighbors
* ODE: Open-Set Evaluation of Hallucinations in Multimodal Large Language Models
* ODHSR: Online Dense 3D Reconstruction of Humans and Scenes from Monocular Videos
* OE-BevSeg: An Object Informed and Environment Aware Multimodal Framework for Bird's-Eye-View Vehicle Semantic Segmentation
* OFER: Occluded Face Expression Reconstruction
* OffsetOPT: Explicit Surface Reconstruction without Normals
* OKG-ConvGRU: A Domain Knowledge-Guided Remote Sensing Prediction Framework for Ocean Elements
* Olympus: A Universal Task Router for Computer Vision Tasks
* Omni-Deblurring: Capturing Omni-Range Context for Image Deblurring
* Omni-ID: Holistic Identity Representation Designed for Generative Tasks
* Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
* Omni-Scene: Omni-Gaussian Representation for Ego-Centric Sparse-View Scene Reconstruction
* Omnia de EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric Videos
* Omnidirectional Multi-Object Tracking
* OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations
* OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning
* OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows
* OmniFuse: Composite Degradation-Robust Image Fusion With Language-Driven Semantics
* OmniGen: Unified Image Generation
* OmniGuard: Hybrid Manipulation Localization via Augmented Versatile Deep Image Watermarking
* OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints
* OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts
* OmniSplat: Taming Feed-Forward 3D Gaussian Splatting for Omnidirectional Images with Editable Capabilities
* OmniStereo: Real-Time Omnidireactional Depth Estimation with Multiview Fisheye Cameras
* OmniStyle: Filtering High Quality Style Transfer Data at Scale
* On Denoising Walking Videos for Gait Recognition
* On the Consistency of Video Large Language Models in Temporal Comprehension
* On the Generalization of Handwritten Text Recognition Models
* On the Hybrid Algorithm for Retrieving Day and Night Cloud Base Height from Geostationary Satellite Observations
* On the Out-Of-Distribution Generalization of Large Multimodal Models
* On the Role of Non-Localities in Fundamental Diagram Estimation
* On the Universal Approximation Properties of Deep Neural Networks Using MAM Neurons
* On the Zero-shot Adversarial Robustness of Vision-Language Models: A Truly Zero-shot and Training-free Approach
* On-Device Self-Supervised Learning of Low-Latency Monocular Depth from Only Events
* On-Orbit Calibration Method for Rotation Axis Misalignment in Rotating Mirror-Based Wide-Field Space Cameras
* Once-Tuning-Multiple-Variants: Tuning Once and Expanded as Multiple Vision-Language Model Variants
* ONDA-Pose: Occlusion-Aware Neural Domain Adaptation for Self-Supervised 6D Object Pose Estimation
* One Diffusion to Generate Them All
* One is Plenty: A Polymorphic Feature Interpreter for Immutable Heterogeneous Collaborative Perception
* One Model for ALL: Low-Level Task Interaction Is a Key to Task-Agnostic Image Fusion
* One-for-More: Continual Diffusion Model for Anomaly Detection
* One-Minute Video Generation with Test-Time Training
* One-shot 3D Object Canonicalization based on Geometric and Semantic Consistency
* One-Step Event-Driven High-Speed Autofocus
* One-Way Ticket: Time-Independent Unified Encoder for Distilling Text-to-Image Diffusion Models
* One2Any: One-Reference 6D Pose Estimation for Any Object
* Online Map Matching Algorithm for Path-Free Trajectories by Integrating Path-Constrained Trajectories, An
* Online Task-Free Continual Learning via Dynamic Expansionable Memory Distribution
* Online Video Understanding: OVBench and VideoChat-Online
* OnlineAnySeg: Online Zero-Shot 3D Segmentation by Visual Foundation Model Guided 2D Mask Merging
* OODD: Test-time Out-of-Distribution Detection with Dynamic Dictionary
* Open Ad-hoc Categorization with Contextualized Feature Learning
* Open Set Label Shift with Test Time Out-of-Distribution Reference
* Open-Canopy: Towards Very High Resolution Forest Monitoring
* Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces
* Open-World Amodal Appearance Completion
* Open-World Objectness Modeling Unifies Novel Object Detection
* OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation
* OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation
* OpenMIBOOD: Open Medical Imaging Benchmarks for Out-Of-Distribution Detection
* OpenSDI: Spotting Diffusion-Generated Images in the Open World
* Opportunistic Single-Photon Time of Flight
* Optical and SAR Image Registration in Equatorial Cloudy Regions Guided by Automatically Point-Prompted Cloud Masks
* Optical properties of a toxin-producing dinoflagellate and its detection from Sentinel-2 MSI in nearshore waters
* Optical-Flow Guided Prompt Optimization for Coherent Video Generation
* OPTICAL: Leveraging Optimal Transport for Contribution Allocation in Dataset Distillation
* OpticalNet: An Optical Imaging Dataset and Benchmark Beyond the Diffraction Limit
* Optimal Control of Maritime Container Terminal Operations for Effective Utilization of Resources
* Optimal Estimation Model for Soil Salinization Based on the FOD-CNN Spectral Index, The
* Optimal Load Balancing of Cooperative UAV-UGV Parcel Pickup to Minimize Completion Time
* Optimal Transport and Central Moment Consistency Regularization for Semi-Supervised Medical Image Segmentation
* Optimal Transport-Guided Source-Free Adaptation for Face Anti-Spoofing
* Optimization on Multimodal Network Considering Time Window Under Uncertain Demand
* Optimized Attention Induced Multi Head Convolutional Neural Network for Intrusion Detection Systems in Vehicular Ad Hoc Networks
* Optimized Coded Apertures for Hyperspectral Image Reconstruction via Variant RIP Constant
* Optimizing for the Shortest Path in Denoising Diffusion Model
* Optimizing Hyperspectral Desertification Monitoring Through Metaheuristic-Enhanced Wavelet Packet Noise Reduction and Feature Band Selection
* Optimizing the Sampling Strategy for Future Libera Radiance to Irradiance Conversions
* Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy
* OralXrays-9: Towards Hospital-Scale Panoramic X-ray Anomaly Detection via Personalized Multi-Object Query-Aware Mining
* Order-One Rolling Shutter Cameras
* Order-Robust Class Incremental Learning: Graph-Driven Dynamic Similarity Grouping
* ORIDa: Object-centric Real-world Image Composition Dataset
* OSDFace: One-Step Diffusion Model for Face Restoration
* OSLoPrompt: Bridging Low-Supervision Challenges and Open-Set Domain Generalization in CLIP
* OSMamba: Omnidirectional Spectral Mamba with Dual-Domain Prior Generator for Exposure Correction
* OSV: One Step is Enough for High-Quality Image to Video Generation
* Otter: A Multi-Modal Model With In-Context Instruction Tuning
* Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion
* Overcoming Shortcut Problem in VLM for Robust Out-of-Distribution Detection
* OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels
* OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
* OW-OVD: Unified Open World and Open Vocabulary Object Detection
* P2Object: Single Point Supervised Object Detection and Instance Segmentation
* P2P: Part-to-Part Motion Cues Guide a Strong Tracking Framework for LiDAR Point Clouds
* PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models
* Paint by Inpaint: Learning to Add Image Objects by Removing Them First
* PanAf-FGBG Dataset: Understanding the Impact of Backgrounds in Wildlife Behaviour Recognition, The
* PanDA: Towards Panoramic Depth Anything with Unlabeled Panoramas and Möbius Spatial Augmentation
* PanoGS: Gaussian-based Panoptic Segmentation for 3D Open Vocabulary Scene Understanding
* Panoptic Plant Recognition in 3D Point Clouds: A Dual-Representation Learning Approach with the PP3D Dataset
* Panorama Generation From NFoV Image Done Right
* PanSplat: 4K Panorama Synthesis with Feed-Forward Gaussian Splatting
* Paragraph-to-Image Generation with Information-Enriched Diffusion Model
* ParaHome: Parameterizing Everyday Home Activities Towards 3D Generative Modeling of Human-Object Interactions
* Parallel Sequence Modeling via Generalized Spatial Propagation Network
* Parallelized Autoregressive Visual Generation
* Parameter Efficient Mamba Tuning via Projector-targeted Diagonal-centric Linear Transformation
* Parameter-efficient Fine-tuning in Hyperspherical Space for Open-vocabulary Semantic Segmentation
* Parameterized Blur Kernel Prior Learning for Local Motion Deblurring
* Parametric Point Cloud Completion for Polygonal Surface Reconstruction
* PARC: A Quantitative Framework Uncovering the Symmetries within Vision Language Models
* PartGen: Part-level 3D Generation and Reconstruction with Multi-View Diffusion Models
* Partial Distribution Matching via Partial Wasserstein Adversarial Networks
* PartRM: Modeling Part-Level Dynamics with Large Cross-State Reconstruction Model
* PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution
* PatchDEMUX: A Certifiably Robust Framework for Multi-label Classifiers Against Adversarial Patches
* PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation
* PatchGuard: Adversarially Robust Anomaly Detection and Localization through Vision Transformers and Pseudo Anomalies
* PatchVSR: Breaking Video Diffusion Resolution Limits with Patch-wise Video Super-Resolution
* Path-Based Model for Aberration Correction in Ultrasound Imaging, A
* Pathways on the Image Manifold: Image Editing via Video Generation
* Patient-Level Anatomy Meets Scanning-Level Physics: Personalized Federated Low-Dose CT Denoising Empowered by Large Language Model
* Pattern Analogies: Learning to Perform Programmatic Image Edits by Analogy
* PAVE: Patching and Adapting Video Large Language Models
* Pay Attention to the Foreground in Object-Centric Learning
* PBR-NeRF: Inverse Rendering with Physics-Based Neural Fields
* PCDreamer: Point Cloud Completion Through Multi-view Diffusion Priors
* PCM: Picard Consistency Model for Fast Parallel Sampling of Diffusion Models
* PDAA: An End-to-End Polygon Dynamic Adjustment Algorithm for Building Footprint Extraction
* PDFactor: Learning Tri-Perspective View Policy Diffusion Field for Multi-Task Robotic Manipulation
* PEACE: Empowering Geologic Map Holistic Understanding with MLLMs
* PEER pressure: Model-to-Model Regularization for Single Source Domain Generalization
* Percept, Memory, and Imagine: World Feature Simulating for Open-Domain Unknown Object Detection
* Perception Tokens Enhance Visual Reasoning in Multimodal Language Models
* Perceptual Inductive Bias Is What You Need Before Contrastive Learning
* Perceptual Uncertainty-Aware Motion Planning for Autonomous Driving Based on Adaptive Heuristic Reinforcement Learning
* Perceptual Video Compression with Neural Wrapping
* Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics
* Performance Analysis of Stellar Refraction Autonomous Navigation for Cross-Domain Vehicles
* Performance Degradation in Monopulse Angle Measurement of Planar Phased-Array Due to Cross-Polarization Component
* Period-LLM: Extending the Periodic Capability of Multimodal Large Language Model
* PerLA: Perceptive 3D language assistant
* PERSE: Personalized 3D Generative Avatars from A Single Portrait
* Person De-reidentification: A Variation-guided Identity Shift Modeling
* PersonaBooth: Personalized Text-to-Motion Generation
* PersonaHOI: Effortlessly Improving Face Personalization in Human-Object Interaction Generation
* Personalized Preference Fine-tuning of Diffusion Models
* Perturb-and-Revise: Flexible 3D Editing with Generative Trajectories
* PFedLAH: Personalized Federated Learning With Lookahead for Adaptive Cross-Modal Hashing
* pFedMxF: Personalized Federated Class-incremental Learning with Mixture of Frequency Aggregation
* PGC: Physics-Based Gaussian Cloth from a Single Pose
* Phase congruency image mosaicking approach for aerial mid-wave infrared low-overlap array scanning images
* PhD: A ChatGPT-Prompted Visual hallucination Evaluation Dataset
* Phenology-Aware Machine Learning Framework for Chlorophyll Estimation in Cotton Using Hyperspectral Reflectance
* Phenology-Aware Transformer for Semantic Segmentation of Non-Food Crops from Multi-Source Remote Sensing Time Series
* PHGC: Procedural Heterogeneous Graph Completion for Natural Language Task Verification in Egocentric Videos
* Phoenix: A Motion-Based Self-Reflection Framework for Fine-Grained Robotic Action Correction
* Photoacoustic-Strain (PAS) Imaging for Tissue Microcirculation Assessment
* Photographer's Eye: Teaching Multimodal Large Language Models to See and Critique like Photographers, The
* PhyS-EdiT: Physics-aware Semantic Image Editing with Text Description
* PhysAnimator: Physics-Guided Generative Cartoon Animation
* PhysGen3D: Crafting a Miniature Interactive World from a Single Image
* Physical Plausibility-aware Trajectory Prediction via Locomotion Embodiment
* Physics of Motion, Geometry of Cohesion: A Silky Gaussian Head Avatar Framework
* Physics-Informed Blur Learning Framework for Imaging Systems, A
* PhysicsGen: Can Generative Models Learn from Images to Predict Complex Physical Relations?
* PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability
* PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation
* PI-HMR: Towards Robust In-bed Temporal Human Shape Reconstruction with Contact Pressure Sensing
* PIAD: Pose and Illumination agnostic Anomaly Detection
* PICD: Versatile Perceptual Image Compression with Diffusion Rendering
* PICO: Reconstructing 3D People In Contact with Objects
* PIDLoc: Cross-View Pose Optimization Network Inspired by PID Controllers
* PIDSR: Complementary Polarized Image Demosaicing and Super-Resolution
* Piecewise-ICP: Efficient and robust registration for 4D point clouds in permanent laser scanning
* PillarHist: A Quantization-aware Pillar Feature Encoder based on Height-aware Histogram
* Pioneering 4-Bit FP Quantization for Diffusion Models: Mixup-Sign Quantization and Timestep-Aware Fine-Tuning
* Pippo: High-Resolution Multi-View Humans from a Single Image
* Pixel-aligned RGB-NIR Stereo Imaging and Dataset for Robot Vision
* Pixel-level and Semantic-level Adjustable Super-resolution: A Dual-LoRA Approach
* PlanarSplatting: Accurate Planar Surface Reconstruction in 3 Minutes
* Platoon Control Leveraging Network Performance and State Estimation Under Dynamic V2V Network
* Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy
* PLeaS: Merging Models with Permutations and Least Squares
* Plug-and-Play Interpretable Responsible Text-to-Image Generation via Dual-Space Multi-facet Concept Control
* Plug-and-Play PPO: An Adaptive Point Prompt Optimizer Making SAM Greater
* Plug-and-Play Versatile Compressed Video Enhancement
* PMA: Towards Parameter-Efficient Point Cloud Understanding via Point Mamba Adapter
* PMNI: Pose-free Multi-view Normal Integration for Reflective and Textureless Surface Reconstruction
* PO3AD: Predicting Point Offsets toward Better 3D Point Cloud Anomaly Detection
* Point Cloud Upsampling Using Conditional Diffusion Module with Adaptive Noise Suppression
* Point Clouds Meets Physics: Dynamic Acoustic Field Fitting Network for Point Cloud Understanding
* Point-Cache: Test-time Dynamic and Hierarchical Cache for Robust and Generalizable Point Cloud Analysis
* Point-to-Region Loss for Semi-Supervised Point-Based Crowd Counting
* Point2RBox-v2: Rethinking Point-supervised Oriented Object Detection with Spatial Layout Among Instances
* PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning
* PointSR: Self-Regularized Point Supervision for Drone-View Object Detection
* Pointwise deep learning for leaf-wood segmentation of tropical tree point clouds from terrestrial laser scanning
* PolarBEVU: Multi-Camera 3D Object Detection in Polar Bird's-Eye View via Unprojection
* PolarFree: Polarization-based Reflection-Free Imaging
* Polarization-Aided Transformer for Image Deblurring via Motion Vector Decomposition, A
* Polarized Color Screen Matting
* PolarNeXt: Rethink Instance Segmentation with Polar Representation
* Poleward Shift of the Equatorial Ionization Anomaly During the Main Phase of the Superstorm on 10 May 2024, The
* PolSAR-SFCGN: An End-to-End PolSAR Superpixel Fully Convolutional Generation Network
* Poly-Autoregressive Prediction for Modeling Interactions
* POMP: Physics-constrainable Motion Generative Model through Phase Manifolds
* PONet: A Compact RGB-IR Fusion Network for Vehicle Detection on OrangePi AIpro
* POp-GS: Next Best View in 3D-Gaussian Splatting with P-Optimality
* POPEN: Preference-Based Optimization and Ensemble for LVLM-Based Reasoning Segmentation
* Population Normalization for Federated Learning
* Pos3R: 6D Pose Estimation for Unseen Objects Made Easy
* Pose Priors from Language Models
* Pose-Guided Temporal Enhancement for Robust Low-Resolution Hand Reconstruction
* Pose-Guided Transformer for Fine-Grained Action Quality Assessment
* PoseBH: Prototypical Multi-Dataset Training Beyond Human Pose Estimation
* PoseTraj: Pose-Aware Trajectory Control in Video Diffusion
* Position Guided Dynamic Receptive Field Network: A Small Object Detection Friendly to Optical and SAR Images
* Positive2Negative: Breaking the Information-Lossy Barrier in Self-Supervised Single Image Denoising
* Post-pre-training for Modality Alignment in Vision-Language Foundation Models
* POSTA: A Go-to Framework for Customized Artistic Poster Generation
* PosterMaker: Towards High-Quality Product Poster Generation with Accurate Text Rendering
* PosterO: Structuring Layout Trees to Enable Language Models in Generalized Content-Aware Layout Generation
* POT: Prototypical Optimal Transport for Weakly Supervised Semantic Segmentation
* Potential Field Based Deep Metric Learning
* Potential of Multi-Source Multispectral vs. Hyperspectral Remote Sensing for Winter Wheat Nitrogen Monitoring
* Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors
* Power of Context: How Multimodality Improves Image Super-Resolution, The
* PQPP: A Joint Benchmark for Text-to-Image Prompt and Query Performance Prediction
* Practical solutions to the relative pose of three calibrated cameras
* PRaDA: Projective Radial Distortion Averaging
* Pre-Instruction for Pedestrians Interacting Autonomous Vehicles With eHMI: Effects on Their Psychology and Walking Behavior
* Precipitation Governs Terrestrial Water Storage Anomaly Decline in the Hengduan Mountains Region, China, Amid Climate Change
* Precise Event Spotting in Sports Videos: Solving Long-Range Dependency and Class Imbalance
* Precise, Fast, and Low-cost Concept Erasure in Value Space: Orthogonal Complement Matters
* PreciseCam: Precise Camera Control for Text-to-Image Generation
* Preconditioners for the Stochastic Training of Neural Fields
* Predicting EV Li-Ion Battery Fires: An Integrated Approach Using Generative AI and Machine Learning Based on Vented Gas Emissions
* PrEditor3D: Fast and Precise 3D Shape Editing
* Preliminary Analysis of a Novel Spaceborne Pseudo Tripe-Frequency Radar Observations on Cloud and Precipitation: EarthCARE CPR-GPM DPR Coincidence Dataset
* Preserve or Modify? Context-Aware Evaluation for Balancing Preservation and Modification in Text-Guided Image Editing
* Preserving Clusters in Prompt Learning for Unsupervised Domain Adaptation
* PRFormer: Matching Proposal and Reference Masks by Semantic and Spatial Similarity for Few-Shot Semantic Segmentation
* Prior-free 3D Object Tracking
* Prior-Guided Dual-Reference Contrastive Learning for Underwater Object Detection
* Privacy-Preserving Image Retrieval Based on Thumbnail-Preserving Visual Features
* ProAPO: Progressively Automatic Prompt Optimization for Visual Classification
* Probabilistic Prompt Distribution Learning for Animal Pose Estimation
* Probability Density Geodesics in Image Diffusion Latent Space
* ProbeSDF: Light Field Probes For Neural Surface Reconstruction
* Probing the Mid-level Vision Capabilities of Self-Supervised Learning
* ProbPose: A Probabilistic Approach to 2D Human Pose Estimation
* Prof. Robot: Differentiable Robot Rendering Without Static and Self-Collisions
* Profile-Based Building Detection Using Convolutional Neural Network and High-Resolution Digital Surface Models
* Progress-Aware Video Frame Captioning
* Progressive Correspondence Regenerator for Robust 3D Registration
* Progressive Focused Transformer for Single Image Super-Resolution
* Progressive Rendering Distillation: Adapting Stable Diffusion for Instant Text-to-Mesh Generation without 3D Data
* Progressive Shrinkage of the Alpine Periglacial Weathering Zone and Its Escalating Disaster Risks in the Gongga Mountains over the Past Four Decades
* Progressive Terrain Segmentation Network for Navigable Areas With Global Sparsity-Entropy and Fusion-Awareness
* ProHOC: Probabilistic Hierarchical Out-of-Distribution Classification via Multi-Depth Networks
* ProjAttacker: A Configurable Physical Adversarial Attack for Face Recognition via Projector
* Project-Probe-Aggregate: Efficient Fine-Tuning for Group Robustness
* Projections of Urban Heat Island Effects Under Future Climate Scenarios: A Case Study in Zhengzhou, China
* ProKeR: A Kernel Perspective on Few-Shot Adaptation of Large Vision-Language Models
* Prometheus: 3D-Aware Latent Diffusion Models for Feed-Forward Text-to-3D Scene Generation
* Prompt-CAM: Making Vision Transformers Interpretable for Fine-Grained Analysis
* Prompt-Gated Transformer with Spatial-Spectral Enhancement for Hyperspectral Image Classification
* Prompt2Perturb (P2P): Text-Guided Diffusion-Based Adversarial Attacks on Breast Ultrasound Images
* PromptHash: Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval
* PromptHMR: Promptable Human Mesh Recovery
* Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation
* ProReflow: Progressive Reflow with Decomposed Velocity
* Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing
* Protecting Your Video Content: Disrupting Automated Video-based LLM Annotations
* ProtoDepth: Unsupervised Continual Depth Completion with Prototypes
* Prototype-Based Image Prompting for Weakly Supervised Histopathological Image Segmentation
* Prototype-Oriented Clean Subset Extraction for Noisy Long-Tailed Classification
* Provable Privacy Protection Authentication Protocol for Vehicle-to-Vehicle Communication, A
* Provoking Multi-modal Few-Shot LVLM via Exploration-Exploitation In-Context Learning
* Proximal Algorithm Unrolling: Flexible and Efficient Reconstruction Networks for Single-Pixel Imaging
* ProxyTransformation: Preshaping Point Cloud Manifold With Proxy Attention For 3D Visual Grounding
* PRPSV: Parking Efficiency and Reservation Service Optimization Based on Parking Space View
* PS-Diffusion: Photorealistic Subject-Driven Image Editing with Disentangled Control and Attention
* PS-EIP: Robust Photometric Stereo Based on Event Interval Profile
* PSA-SSL: Pose and Size-aware Self-Supervised Learning on LiDAR Point Clouds
* PSBD: Prediction Shift Uncertainty Unlocks Backdoor Detection
* Pseudo Visible Feature Fine-Grained Fusion for Thermal Object Detection
* Pseudo-EV: Enhancing 3D Visual Grounding With Pseudo Embodied Viewpoint
* PSHuman: Photorealistic Single-image 3D Human Reconstruction using Cross-Scale Multiview Diffusion and Explicit Remeshing
* PTDiffusion: Free Lunch for Generating Optical Illusion Hidden Pictures with Phase-Transferred Diffusion Model
* PUP 3D-GS: Principled Uncertainty Pruning for 3D Gaussian Splatting
* PURA: Parameter Update-Recovery Test-Time Adaption for RGB-T Tracking
* Pursuing Temporal-Consistent Video Virtual Try-On via Dynamic Pose Interaction
* PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
* Pyramid Learnable Bandpass Filters for Ultra-High-Definition Image Demoiréing
* PyTorchGeoNodes: Enabling Differentiable Shape Programs for 3D Shape Reconstruction
* Q-Bench-Video: Benchmark the Video Quality Understanding of LMMs
* Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers
* Q-Eval-100K: Evaluating Visual Quality and Alignment Level for Text-to-Vision Content
* Q-PART: Quasi-Periodic Adaptive Regression with Test-time Training for Pediatric Left Ventricular Ejection Fraction Regression
* QDNet: Query-Denoising Network for Visual Traffic Knowledge Graph Generation
* QMambaBSR: Burst Image Super-Resolution with Query State Space Model
* QMix: Quality-Aware Learning With Mixed Noise for Robust Retinal Disease Diagnosis
* Quad-Pixel Image Defocus Deblurring: A New Benchmark and Model
* Quaffure: Real-Time Quasi-Static Neural Hair Simulation
* Quality Assessment of Dual-Polarization C-Band SAR Data Influenced by Precipitation Based on Normalized Polarimetric Radar Vegetation Index
* Quantification of MODIS Land Surface Temperature Downscaled by Machine Learning Algorithms
* Quantifying Improvements in Derived Storm Events from Version 07 of GPM IMERG Early, Late, and Final Data Products over North Carolina
* Quantifying Multifactorial Drivers of Groundwater-Climate Interactions in an Arid Basin Based on Remote Sensing Data
* Quantitative Estimation of Vegetation Carbon Source/Sink and Its Response to Climate Variability and Anthropogenic Activities in Dongting Lake Wetland, China
* Quantization without Tears
* QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge
* QuCOOP: A Versatile Framework for Solving Composite and Binary-Parametrised Problems on Quantum Annealers
* Query Efficient Black-Box Visual Prompting with Subspace Learning
* Question-Aware Gaussian Experts for Audio-Visual Question Answering
* Queuing Model and Capacity Analysis for Reservation-Based Autonomous Intersection, A
* R-SCoRe: Revisiting Scene Coordinate Regression for Robust Large-Scale Visual Localization
* R-TPT: Improving Adversarial Robustness of Vision-Language Models through Test-Time Prompt Tuning
* R2C: Mapping Room to Chessboard to Unlock LLM As Low-Level Action Planner
* RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion
* RAD: Region-Aware Diffusion Models for Image Inpainting
* Radar Monitoring and Numerical Simulation Reveal the Impact of Underground Blasting Disturbance on Slope Stability
* Radar-Based Fast Code for Rainfall Nowcasting over the Tuscany Region, A
* Radiative Transfer Model-Integrated Approach for Hyperspectral Simulation of Mixed Soil-Vegetation Scenarios and Soil Organic Carbon Estimation
* Radio Frequency Ray Tracing with Neural Object Representation for Enhanced RF Modeling
* RADIOv2.5: Improved Baselines for Agglomerative Vision Foundation Models
* RAEncoder: A Label-Free Reversible Adversarial Examples Encoder for Dataset Intellectual Property Protection
* RailFusion: A Lidar-Camera Data Interaction Network for 3-D Railway Object Detection
* RainyGS: Efficient Rain Synthesis with Physically-Based Gaussian Splatting
* RAMPGrasp: Retentive Attention-Based Multiscale Perception Grasp Detection Network
* RandAR: Decoder-only Autoregressive Visual Generation in Random Orders
* Random Conditioning with Distillation for Data-Efficient Diffusion Model Compression
* RANGE: Retrieval Augmented Neural Fields for Multi-Resolution Geo-Embeddings
* RAP: Retrieval-Augmented Personalization for Multimodal Large Language Models
* Rapid 3D Camera Calibration for Large-Scale Structural Monitoring
* Rapid Detection and Segmentation of Landslide Hazards in Loess Tableland Areas Using Deep Learning: A Case Study of the 2023 Jishishan Ms 6.2 Earthquake in Gansu, China
* Rapid Distance Estimation to Uncooperative Radars in Electronic Warfare Using Optimization and Transformer-Based Segmentation
* Rashomon Sets for Prototypical-Part Networks: Editing Interpretable Models in Real-Time
* RASP: Revisiting 3D Anamorphic Art for Shadow-Guided Packing of Irregular Objects
* RaSS: Improving Denoising Diffusion Samplers with Reinforced Active Sampling Scheduler
* Rate-Distortion-Optimized Deep Preprocessing for JPEG Compression
* Rate-In: Information-Driven Adaptive Dropout Rates for Improved Inference-Time Uncertainty Estimation
* RayFlow: Instance-Aware Diffusion Acceleration via Adaptive Flow Trajectories
* RC-AutoCalib: An End-to-End Radar-Camera Automatic Calibration Network
* RCP-Bench: Benchmarking Robustness for Collaborative Perception Under Diverse Corruptions
* RDD: Robust Feature Detector and Descriptor using Deformable Transformer
* Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model
* Re-thinking Temporal Search for Long-Form Video Understanding
* Readiness Evaluation of Freeways for Lane-Detection Performance of Lidar-Based Automated Vehicles: A Field Test Analysis
* Real-IAD D3: A Real-World 2D/Pseudo-3D/3D Dataset for Industrial Anomaly Detection
* Real-Time Driving Style Integration in Deep Reinforcement Learning for Traffic Signal Control
* Real-time Free-view Human Rendering from Sparse-view RGB Videos using Double Unprojected Textures
* Real-time High-fidelity Gaussian Human Avatars with Position-based Interpolation of Spatially Distributed MLPs
* RealEdit: Reddit Edits As a Large-scale Empirical Dataset for Image Transformations
* Realistic Test-Time Adaptation of Vision-Language Models
* Realistic virtual forests for understanding forest disturbances and recovery from space
* Reanimating Images using Neural Representations of Dynamic Stimuli
* Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval
* ReasonGrounder: LVLM-Guided Hierarchical Feature Splatting for Open-Vocabulary 3D Visual Grounding and Reasoning
* Reasoning in visual navigation of end-to-end trained agents: A Dynamical Systems Approach
* Reasoning Mamba: Hypergraph-Guided Region Relation Calculating for Weakly Supervised Affordance Grounding
* Reasoning to Attend: Try to Understand How Token Works
* ReCap: Better Gaussian Relighting with Cross-Environment Captures
* ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning
* Recent Active Wildland Fires Related to Rossby Wave Breaking (RWB) in Alaska
* Recognition-Synergistic Scene Text Editing
* ReCon: Enhancing True Correspondence Discrimination through Relation Consistency for Robust Noisy Correspondence Learning
* Reconciling Stochastic and Deterministic Strategies for Zero-shot Image Restoration using Diffusion Model in Dual
* ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration
* Reconstructing Animals and the Wild
* Reconstructing Close Human Interaction with Appearance and Proxemics Reasoning
* Reconstructing Humans with a Biomechanically Accurate Skeleton
* Reconstructing In-the-Wild Open-Vocabulary Human-Object Interactions
* Reconstructing People, Places, and Cameras
* Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
* Recover and Match: Open-Vocabulary Multi-Label Recognition through Knowledge-Constrained Optimal Transport
* Recovering Dynamic 3D Sketches from Videos
* Rectification-specific Supervision and Constrained Estimator for Online Stereo Rectification
* Rectified Diffusion Guidance for Conditional Generation
* Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval
* Recurrent Feature Mining and Keypoint Mixup Padding for Category-Agnostic Pose Estimation
* Redactable Blockchain-Based Anonymous Announcement Scheme for VANETs, A
* Redefining in Dictionary: Towards an Enhanced Semantic Understanding of Creative Generation
* ReDiffDet: Rotation-equivariant Diffusion Model for Oriented Object Detection
* Reduced Motion Sickness Using Vestibular EEG-Guided tACS Under Mismatched Physical Rotation and VR Visual Motion
* Reducing Class-wise Confusion for Incremental Learning with Disentangled Manifolds
* Ref-GS: Directional Factorization for 2D Gaussian Splatting
* Reference-Based 3D-Aware Image Editing with Triplanes
* Refined Multipath Correction Model for High-Precision GNSS Deformation Monitoring, A
* RefPose: Leveraging Reference Geometric Correspondences for Accurate 6D Pose Estimation of Unseen Objects
* Regional Assessment of COCTS HY1-C/D Chlorophyll-a and Suspended Particulate Matter Standard Products over French Coastal Waters
* Registration of close-range, multi-lens multispectral imagery by retrieving the scene 3D structure
* Regularization-Guided Equivariant Approach for Image Restoration, A
* Reinforced Neighborhood Search Method Combined With Genetic Algorithm for Multi-Objective Multi-Robot Transportation System, A
* Relation-Rich Visual Document Generator for Visual Information Extraction
* Relation3D: Enhancing Relation Modeling for Point Cloud Instance Segmentation
* RelationField: Relate Anything in Radiance Fields
* Relative Pose Estimation through Affine Corrections of Monocular Depth Priors
* Reliable and Balanced Transfer Learning for Generalized Multimodal Face Anti-Spoofing
* Reloc3r: Large-Scale Training of Relative Camera Pose Regression for Generalizable, Fast, and Accurate Visual Localization
* Relocate: A Simple Training-Free Baseline for Visual Query Localization Using Region-Based Representations
* Remote Photoplethysmography in Real-World and Extreme Lighting Scenarios
* Remote Sensing Archaeology of the Xixia Imperial Tombs: Analyzing Burial Landscapes and Geomantic Layouts
* Remote Sensing Evidence of Blue Carbon Stock Increase and Attribution of Its Drivers in Coastal China
* Remote Sensing Image Compression via Wavelet-Guided Local Structure Decoupling and Channel-Spatial State Modeling
* Remote sensing of urban tree carbon stocks: A methodological review
* Remote Sensing Perspective on Monitoring and Predicting Underground Energy Sources Storage Environmental Impacts: Literature Review
* Remote Sensing-Based Analysis of the Coupled Impacts of Climate and Land Use Changes on Future Ecosystem Resilience: A Case Study of the Beijing-Tianjin-Hebei Region
* Remote Sensing-Based Assessment of Evapotranspiration Patterns in a UNESCO World Heritage Site Under Increasing Water Competition
* Remote Sensing-Based Phenology of Dryland Vegetation: Contributions and Perspectives in the Southern Hemisphere
* Remote Target High-Precision Global Geolocalization of UAV Based on Multimodal Visual Servo
* Remotely Sensing Phytoplankton Size Structure in the Mediterranean Sea: Insights from In Situ Data and Temperature-Corrected Abundance-Based Models
* Removing Reflections from RAW Photos
* Rendering-Oriented 3D Point Cloud Attribute Compression Using Sparse Tensor-Based Transformer
* ReNeg: Learning Negative Embedding with Reward Guidance
* RENO: Real-Time Neural Compression for 3D LiDAR Point Clouds
* RePerformer: Immersive Human-centric Volumetric Videos from Playback to Photoreal Reperformance
* Reproducible Vision-Language Models Meet Concepts Out of Pre-Training
* Repurposing Pre-trained Video Diffusion Models for Event-based Video Interpolation
* Repurposing Stable Diffusion Attention for Training-Free Unsupervised Interactive Segmentation
* Reputation System-Based Vehicle Violation Reporting Service With Invalid Signature Identification in VANETs
* ReRAW: RGB-to-RAW Image Reconstruction via Stratified Sampling for Efficient Object Detection on the Edge
* ResCLIP: Residual Attention for Training-free Dense Vision-language Inference
* Research on Distributed Collaborative Task Planning and Countermeasure Strategies for Satellites Based on Game Theory Driven Approach
* Research on the Autonomous Orbit Determination of Beidou-3 Assisted by Satellite Laser Ranging Technology
* Research on Thermal Environment Influencing Mechanism and Cooling Model Based on Local Climate Zones: A Case Study of the Changsha-Zhuzhou-Xiangtan Urban Agglomeration
* Resilient Sensor Fusion under Adverse Sensor Failures via Multi-Modal Expert Fusion
* ReSpec: Relevance and Specificity Grounded Online Filtering for Learning on Video-Text Data Streams
* RestorGS: Depth-aware Gaussian Splatting for Efficient 3D Scene Restoration
* Retaining Knowledge and Enhancing Long-Text Representations in CLIP through Dual-Teacher Distillation
* Rethinking Correspondence-based Category-Level Object Pose Estimation
* Rethinking Decoder Design: Improving Biomarker Segmentation Using Depth-to-Space Restoration and Residual Linear Attention
* Rethinking Diffusion for Text-Driven Human Motion Generation: Redundant Representations, Evaluation, and Masked Autoregression
* Rethinking End-to-End 2D to 3D Scene Segmentation in Gaussian Splatting
* Rethinking Epistemic and Aleatoric Uncertainty for Active Open-Set Annotation: An Energy-Based Approach
* Rethinking Few-Shot Adaptation of Vision-Language Models in Two Stages
* Rethinking Lanes and Points in Complex Scenarios for Monocular 3D Lane Detection
* Rethinking Noisy Video-Text Retrieval via Relation-aware Alignment
* Rethinking Personalized Aesthetics Assessment: Employing Physique Aesthetics Assessment as An Exemplification
* Rethinking Query-Based Transformer for Continual Image Segmentation
* Rethinking Reconstruction and Denoising in the Dark: New Perspective, General Architecture and Beyond
* Rethinking Spiking Self-Attention Mechanism: Implementing alpha-XNOR Similarity Calculation in Spiking Transformers
* Rethinking Temporal Fusion with a Unified Gradient Descent View for 3D Semantic Occupancy Prediction
* Rethinking the Adversarial Robustness of Multi-Exit Neural Networks in an Attack-Defense Game
* Rethinking Token Reduction with Parameter-Efficient Fine-Tuning in ViT for Pixel-Level Tasks
* Rethinking Training for De-biasing Text-to-Image Generation: Unlocking the Potential of Stable Diffusion
* Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector
* Retraction Notice: Softwarized Resource Management and Allocation With Autonomous Awareness for 6G-Enabled Cooperative Intelligent Transportation Systems
* Retraction Notice: Tensor-Based Secure Truthful Incentive Mechanism for Mobile Crowdsourcing in IoT-Enabled Maritime Transportation Systems
* Retrieval of Cloud, Atmospheric, and Surface Properties from Far-Infrared Spectral Radiances Measured by FIRMOS-B During the 2022 HEMERA Stratospheric Balloon Campaign
* Retrieving Semantics from the Deep: an RAG Solution for Gesture Synthesis
* Revealing Key Details to See Differences: A Novel Prototypical Perspective for Skeleton-based Action Recognition
* Reverse Causal Framework to Mitigate Spurious Correlations for Debiasing Scene Graph Generation, A
* Reversible Decoupling Network for Single Image Reflection Removal
* Reversing Flow for Image Restoration
* ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos
* Revisit Weakly Supervised Hashing With Deep Multi-Modal Foundation Models
* Revisiting Audio-Visual Segmentation with Vision-Centric Transformer
* Revisiting Backdoor Attacks against Large Vision-Language Models from Domain Shift
* Revisiting Fairness in Multitask Learning: A Performance-Driven Approach for Variance Reduction
* Revisiting Generative Replay for Class Incremental Object Detection
* Revisiting MAE pre-training for 3D medical image segmentation
* Revisiting Siamese-Based 3D Single Object Tracking With a Versatile Transformer
* Revisiting Source-Free Domain Adaptation: Insights into Representativeness, Generalization, and Variety
* ReviveDiff: A Universal Diffusion Model for Restoring Images in Adverse Weather Conditions
* Reward Fine-Tuning Two-Step Diffusion Models via Learning Differentiable Latent-Space Surrogate Reward
* Rewind: Real-Time Egocentric Whole-Body Motion Diffusion with Exemplar-Based Identity Conditioning
* ReWind: Understanding Long Videos with Instructed Learnable Memory
* RGB-D Visual Perception for Occluded Scenes via Event Camera
* RGBAvatar: Reduced Gaussian Blendshapes for Online Modeling of Head Avatars
* RICCARDO: Radar Hit Prediction and Convolution for Camera-Radar 3D Object Detection
* Ride-Hailing Service Pattern Recognition and Demand Prediction: A Reinforcement Ensemble Learning With Fuzzy C-Means Clustering Approach
* RigGS: Rigging of 3D Gaussians for Modeling Articulated Objects in Videos
* RipVIS: Rip Currents Video Instance Segmentation Benchmark for Beach Monitoring and Safety
* Risk Evaluation of Agricultural Non-Point Source Pollution in Typical Hilly and Mountainous Areas: A Case Study of Yongchuan District, Chongqing City, China
* RivuletMLP: An MLP-based Architecture for Efficient Compressed Video Quality Enhancement
* RL-RC-DoT: A Block-level RL agent for Task-Aware Video Compression
* RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness
* RNG: Relightable Neural Gaussians
* RoadSocial: A Diverse VideoQA Dataset and Benchmark for Road Event Understanding from Social Video Narratives
* RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete
* RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
* RoboPEPP: Vision-Based Robot Pose and Joint Angle Estimation through Embedding Predictive Pre-Training
* RoboSense: Large-scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured Environments
* RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics
* Robotic Visual Instruction
* RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins
* RobSense: A Robust Multi-modal Foundation Model for Remote Sensing with Static, Temporal, and Incomplete Data Adaptability
* Robust 3D Shape Reconstruction in Zero-Shot from a Single Image in the Wild
* Robust Adaptive Multiple Backtracking VBKF for In-Motion Alignment of Low-Cost SINS/GNSS
* Robust and Efficient Boundary Point Detection Method by Measuring Local Direction Dispersion, A
* Robust Audio-Visual Segmentation via Audio-Guided Visual Convergent Alignment
* Robust Deep Convolutional Dictionary Model With Alignment Assistance for Multi-Contrast MRI Super-Resolution
* Robust Framework for Bamboo Forest AGB Estimation by Integrating Geostatistical Prediction and Ensemble Learning, A
* Robust Hashing With Bilinear Drift for Image-Text Retrieval
* Robust Image Watermarking Using Bidirection-Interactive and Context-Aware Networks
* Robust Information Delivery and Energy Efficiency Maximization in D2D-Based V2X Network
* Robust Message Embedding via Attention Flow-Based Steganography
* Robust Multi-Object 4D Generation for In-the-wild Videos
* Robust Multimodal Survival Prediction with Conditional Latent Differentiation Variational AutoEncoder
* Robust Optical and SAR Image Matching via Attention-Guided Structural Encoding and Confidence-Aware Filtering
* Robust Optical and SAR Image Registration Using Weighted Feature Fusion
* Robust Palmprint Recognition via Multi-Stage Noisy Label Selection and Correction
* Robust Tracking Method for Aerial Extended Targets with Space-Based Wideband Radar, A
* Robust Underwater Vehicle Pose Estimation via Convex Optimization Using Range-Only Remote Sensing Data
* Robust Vehicle Localization for Spherical Camera Models: Solution, Framework, and Verification
* Robust-MVTON: Learning Cross-Pose Feature Alignment and Fusion for Robust Multi-View Virtual Try-On
* Robustness-Congruent Adversarial Training for Secure Machine Learning Model Updates
* ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting
* ROD-MLLM: Towards More Reliable Object Detection in Multimodal Large Language Models
* RoGSplat: Learning Robust Generalizable Human Gaussian Splatting from Sparse Multi-View Images
* ROICtrl: Boosting Instance Control for Visual Generation
* Role of Ocean Penetrative Solar Radiation in the Evolution of Mediterranean Storm Daniel, The
* ROLL: Robust Noisy Pseudo-label Learning for Multi-View Clustering with Noisy Correspondence
* RooFormer: Reconstructing detailed 3D roof models from high-resolution remote sensing imagery using transformer
* RoomPainter: View-Integrated Diffusion for Consistent Indoor Scene Texturing
* RoomTour3D: Geometry-Aware Video-Instruction Tuning for Embodied Navigation
* RORem: Training a Robust Object Remover with Human-in-the-Loop
* ROS-SAM: High-Quality Interactive Segmentation for Remote Sensing Moving Object
* Rotation-Equivariant Self-Supervised Method in Image Denoising
* RS+rPPG: Robust Strongly Self-Supervised Learning for rPPG
* RSAR: Restricted State Angle Resolver and Rotated SAR Benchmark
* RUBIK: A Structured Benchmark for Image Matching across Geometric Challenges
* S2D-LFE: Sparse-to-Dense Light Field Event Generation
* S2Gaussian: Sparse-View Super-Resolution 3D Gaussian Splatting
* S3-Face: SSS-Compliant Facial Reflectance Estimation via Diffusion Priors
* S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Model with Spatio-Temporal Visual Representation
* SACB-Net: Spatial-Awareness Convolutions for Medical Image Registration
* Safe and Efficient Self-Evolving Algorithm for Decision-Making and Control of Autonomous Driving Systems, A
* SAGRNet: A novel object-based graph convolutional neural network for diverse vegetation cover classification in remotely-sensed imagery
* SAIST: Segment Any Infrared Small Target Model Guided by Contrastive Language-Image Pretraining
* SALAD: Skeleton-aware Latent Diffusion for Text-driven Motion Generation and Editing
* Saliuitl: Ensemble Salience Guided Recovery of Adversarial Patches against CNNs
* SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis
* SAM-I2V: Upgrading SAM to Support Promptable Video Segmentation with Less than 0.2% Training Cost
* SAM-REF: Introducing Image-Prompt Synergy during Interaction for Detail Enhancement in the Segment Anything Model
* SAM2-LOVE: Segment Anything Model 2 in Language-aided Audio-Visual Scenes
* SAM2Object: Consolidating View Consistency via SAM2 for Zero-Shot 3D Instance Segmentation
* SaMam: Style-aware State Space Model for Arbitrary Image Style Transfer
* Samba: A Unified Mamba-Based Framework for General Salient Object Detection
* SAMBLE: Shape-Specific Point Cloud Sampling for an Optimal Trade-Off Between Local Detail and Global Uniformity
* Sample Adaptive Localized Simple Multiple Kernel K-Means and its Application in Parcellation of Human Cerebral Cortex
* Sample- and Parameter-Efficient Auto-Regressive Image Models
* Sampling Innovation-Based Adaptive Compressive Sensing
* SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation
* SapiensID: Foundation for Human Recognition
* SAR Images Despeckling Using Subaperture Decomposition and Non-Local Low-Rank Tensor Approximation
* SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE
* SASep: Saliency-Aware Structured Separation of Geometry and Feature for Open Set Learning on Point Clouds
* SAT-HMR: Real-Time Multi-Person 3D Mesh Estimation via Scale-Adaptive Tokens
* SATA: Spatial Autocorrelation Token Analysis for Enhancing the Robustness of Vision Transformers
* Satellite Observations Guided Diffusion Model for Accurate Meteorological States at Arbitrary Resolution
* Satellite to GroundScape: Large-scale Consistent Ground View Generation from Satellite Views
* Satellite-Based Prediction of Water Turbidity Using Surface Reflectance and Field Spectral Data in a Dynamic Tropical Lake
* Satellite-Derived Bathymetry Using Sentinel-2 and Airborne Hyperspectral Data: A Deep Learning Approach with Adaptive Interpolation
* Satellite-Measured Suspended Particulate Matter Flux and Freshwater Flux in the Yellow Sea and East China Sea
* SAVL: Scene-Adaptive UAV Visual Localization Using Sparse Feature Extraction and Incremental Descriptor Mapping
* Scalable Autoregressive Monocular Depth Estimation
* Scalable Multi-View Regression Clustering for Large-Scale Data
* Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents
* Scale Efficient Training for Large Datasets
* ScaleLSD: Scalable Deep Line Segment Detection Streamlined
* ScaleViM-PDD: Multi-Scale EfficientViM with Physical Decoupling and Dual-Domain Fusion for Remote Sensing Image Dehazing
* Scaling Down Text Encoders of Text-to-Image Diffusion Models
* Scaling Inference Time Compute for Diffusion Models
* Scaling Mesh Generation via Compressive Tokenization
* Scaling Properties of Diffusion Models For Perceptual Tasks
* Scaling up Image Segmentation across Data and Tasks
* Scaling Vision Pre-Training to 4K Resolution
* ScaMo: Exploring the Scaling Law in Autoregressive Motion Generation Model
* ScanDTM: A Novel Dual-Temporal Modulation Scanpath Prediction Model for Omnidirectional Images
* SCAP: Transductive Test-Time Adaptation via Supportive Clique-based Attribute Prompting
* SCCA-YOLO: Spatial Channel Fusion and Context-Aware YOLO for Lunar Crater Detection
* Scenario Dreamer: Vectorized Latent Diffusion for Generating Driving Simulation Environments
* Scene Language: Representing Scenes with Programs, Words, and Embeddings, The
* Scene Map-based Prompt Tuning for Navigation Instruction Generation
* Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model
* Scene-agnostic Pose Regression for Visual Localization
* Scene-Centric Unsupervised Panoptic Segmentation
* Scene4U: Hierarchical Layered 3D Scene Reconstruction from Single Panoramic Image for Your Immerse Exploration
* SceneCrafter: Controllable Multi-View Driving Scene Editing
* SceneDiffuser++: City-Scale Traffic Simulation via a Generative World Model
* SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation
* SceneTAP: Scene-Coherent Typographic Adversarial Planner against Vision-Language Models in Real-World Environments
* SceneTracker: Long-Term Scene Flow Estimation Network
* SCFlow2: Plug-and-Play Object Pose Refiner with Shape-Constraint Scene Flow
* Schedule On the Fly: Diffusion Time Prediction for Faster and Better Image Generation
* Science-T2I: Addressing Scientific Illusions in Image Synthesis
* ScribbleLight: Single Image Indoor Relighting with Scribbles
* SCSA: A Plug-and-Play Semantic Continuous-Sparse Attention for Arbitrary Semantic Style Transfer
* SCSegamba: Lightweight Structure-Aware Vision Mamba for Crack Segmentation in Structures
* SDBF: Steep-Decision-Boundary Fingerprinting for Hard-Label Tampering Detection of DNN Models
* SDDGRNets: Level-Level Semantically Decomposed Dynamic Graph Reasoning Network for Remote Sensing Semantic Change Detection
* SDGOCC: Semantic and Depth-Guided Bird's-Eye View Transformation for 3D Multimodal Occupancy Prediction
* Sea-Ing in Low-Light
* SEAL: SEmantic Attention Learning for Long Video Representation
* SeaLion: Semantic Part-Aware Latent Point Diffusion Models for 3D Generation
* Search and Detect: Training-Free Long Tail Object Detection via Web-Image Retrieval
* Seasonal and Interannual Variations in Hydrological Dynamics of the Amazon Basin: Insights from Geodetic Observations
* Seasonal and Long-Term Water Regime Trends of Cheremsky Wetland: Analysis Based on Sentinel-2 Spectral Indices and Composite Indicator Development
* Seasonally Robust Offshore Wind Turbine Detection in Sentinel-2 Imagery Using Imaging Geometry-Aware Deep Learning
* SEC-Prompt: SEmantic Complementary Prompting for Few-Shot Class-Incremental Learning
* SeCap: Self-Calibrating and Adaptive Prompts for Cross-View Person Re-Identification in Aerial-Ground Networks
* Secret Lies in Color: Enhancing AI-Generated Images Detection with Color Distribution Analysis
* See Further When Clear: Curriculum Consistency Model
* SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration
* SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding
* Seeing A 3D World in A Grain of Sand
* Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding
* Seeing is Not Believing: Adversarial Natural Object Optimization for Hard-Label 3D Scene Attacks
* Seeing more with less: human-like representations in vision models
* Seeing Speech and Sound: Distinguishing and Locating Audio Sources in Visual Scenes
* Seeing the Abstract: Translating the Abstract Language for Vision Language Models
* Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection
* Seek Common Ground While Reserving Differences: Semi-Supervised Image-Text Sentiment Recognition
* Seeking Consistent Flat Minima for Better Domain Generalization via Refining Loss Landscapes
* SEEN-DA: SEmantic ENtropy guided Domain-aware Attention for Domain Adaptive Object Detection
* SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories
* SegEarth-OV: Towards Training-Free Open-Vocabulary Segmentation for Remote Sensing Images
* SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation
* Segment Any Motion in Videos
* Segment Any-Quality Images with Generative Latent Space Enhancement
* Segment Anything in 3D with Radiance Fields
* Segment Anything, Even Occluded
* Segment Concealed Objects With Incomplete Supervision
* Segment This Thing: Foveated Tokenization for Efficient Point-Prompted Segmentation
* Segmenting Maxillofacial Structures in CBCT Volumes
* Selective Cross-View Topology for Deep Incomplete Multi-View Clustering
* Selective Imitation Enhanced Deep Reinforcement Learning for AAV Navigation and Obstacle Avoidance With Sparse Rewards
* Selective Noise Empirical Mode Decomposition
* Selective Re-Learning Mechanism for Hyperspectral Fusion Imaging, A
* Self-Cross Diffusion Guidance for Text-to-Image Synthesis of Similar Subjects
* Self-Evolving Visual Concept Library using Vision-Language Critics
* Self-Expansion of Pre-trained Models with Mixture of Adapters for Continual Learning
* Self-Learning Hyperspectral and Multispectral Image Fusion via Adaptive Residual Guided Subspace Diffusion Model
* Self-Supervised Contrastive Framework for Specific Emitter Identification with Limited Labeled Data, A
* Self-supervised ControlNet with Spatio-Temporal Mamba for Real-world Video Super-resolution
* Self-Supervised Cross-View Correspondence with Predictive Cycle Consistency
* Self-Supervised Large Scale Point Cloud Completion for Archaeological Site Restoration
* Self-Supervised Learning for Color Spike Camera Reconstruction
* Self-Supervised Learning of End-to-End 3D LiDAR Odometry for Urban Scene Modeling
* Self-Supervised Spatial Correspondence Across Modalities
* SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting
* SemAlign3D: Semantic Correspondence between RGB-Images through Aligning 3D Object-Class Representations
* Semantic and Expressive Variation in Image Captions Across Languages
* Semantic and Sequential Alignment for Referring Video Object Segmentation
* Semantic Boundary Constrained Network for Visual Place Recognition Under Adverse Conditions
* Semantic Knowledge Complementarity based Decoupling Framework for Semi-supervised Class-imbalanced Medical Image Segmentation, A
* Semantic Library Adaptation: LoRA Retrieval and Fusion for Open-Vocabulary Semantic Segmentation
* Semantic Segmentation of Rice Fields in Sub-Meter Satellite Imagery Using an HRNet-CA-Enhanced DeepLabV3+ Framework
* Semantic Segmentation-Based GNSS Signal Occlusion Detection and Optimization Method, A
* Semantic-guided Cross-Modal Prompt Learning for Skeleton-based Zero-shot Action Recognition
* SemanticDraw: Towards Real-Time Interactive Content Creation from Image Diffusion Models
* SemGeoMo: Dynamic Contextual Human Motion Generation with Semantic and Geometric Guidance
* Semi-Automated, Hybrid GIS-AI Approach to Seabed Boulder Detection Using High Resolution Multibeam Echosounder, A
* Semi-Supervised State-Space Model with Dynamic Stacking Filter for Real-World Video Deraining
* SemiDAViL: Semi-supervised Domain Adaptation with Vision-Language Guidance for Semantic Segmentation
* SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end Text Spotting
* Sensitive Object Trigger-Based Fragile Watermarking for Integrity Verification of Remote Sensing Object Detection Models
* Sensitivity-Aware Efficient Fine-Tuning via Compact Dynamic-Rank Adaptation
* Sensor-Based Yield Prediction in Durum Wheat Under Semi-Arid Conditions Using Machine Learning Across Zadoks Growth Stages
* Separation of powers: On segregating knowledge from observation in LLM-enabled knowledge-based visual question answering
* Seq-BEV: Semantic Bird-Eye-View Map Generation in Full View Using Sequential Images for Autonomous Driving
* Seq2Time: Sequential Knowledge Transfer for Video LLM Temporal Grounding
* SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model
* SeqMvRL: A Sequential Fusion Framework for Multi-view Representation Learning
* Sequencing Game Approach for Eco-Friendly Platoon Merging in a Connected Environment, A
* SerialGen: Personalized Image Generation by First Standardization Then Personalization
* SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding
* SET: Spectral Enhancement for Tiny Object Detection
* Seurat: From Moving Points to Depth
* SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding
* SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement
* SFDM: Robust Decomposition of Geometry and Reflectance for Realistic Face Rendering from Sparse-view Images
* SfM-Free 3D Gaussian Splatting via Hierarchical Training
* SGC-Net: Stratified Granular Comparison Network for Open-Vocabulary HOI Detection
* SGCR: Spherical Gaussians for Efficient 3D Curve Reconstruction
* SGFormer: Satellite-Ground Fusion for 3D Semantic Scene Completion
* SGSST: Scaling Gaussian Splatting Style Transfer
* SGV3D: Toward Scenario Generalization for Vision-Based Roadside 3D Object Detection
* Shading Meets Motion: Self-supervised Indoor 3D Reconstruction Via Simultaneous Shape-from-Shading and Structure-from-Motion
* Shadow Generation Using Diffusion Model with Geometry Prior
* Shadow-Robust Pavement Damage Detection Framework Based on RACycle-GAN and DDE-YOLOv8, A
* Shape Abstraction via Marching Differentiable Support Functions
* Shape and Texture: What Influences Reliable Optical Flow Estimation?
* Shape My Moves: Text-Driven Shape-Aware Synthesis of Human Motions
* ShapeShifter: 3D Variations Using Multiscale and Sparse Point-Voxel Diffusion
* ShapeWords: Guiding Text-to-Image Synthesis with 3D Shape-Aware Prompts
* Sharp-It: A Multi-view to Multi-view Diffusion Model for 3D Synthesis and Manipulation
* SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation
* SHE-Net: Syntax-Hierarchy-Enhanced Text-Video Retrieval
* Shift the Lens: Environment-Aware Unsupervised Camouflaged Object Detection
* ShiftwiseConv: Small Convolutional Kernel with Large Kernel Effect
* Shining Yourself: High-Fidelity Ornaments Virtual Try-on with Diffusion Model
* ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models
* Show and Segment: Universal Medical Image Segmentation via In-Context Learning
* Show and Tell: Visually Explainable Deep Neural Nets via Spatially-Aware Concept Bottleneck Models
* ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions
* ShowMak3r: Compositional TV Show Reconstruction
* ShowUI: One Vision-Language-Action Model for GUI Visual Agent
* SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model
* Silence is Golden: Leveraging Adversarial Examples to Nullify Audio Control in LDM-based Talking-Head Generation
* Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models
* SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation
* Sim-to-Real Causal Transfer: A Metric Learning Approach to Causally-Aware Interaction Representations
* SimAvatar: Simulation-Ready Avatars with Layered Hair and Clothing
* Similarity-Guided Layer-Adaptive Vision Transformer for UAV Tracking
* SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment
* SimLTD: Simple Supervised and Semi-Supervised Long-Tailed Object Detection
* SimMotionEdit: Text-Based Human Motion Editing with Motion Similarity Prediction
* Simple Data Augmentation for Feature Distribution Skewed Federated Learning, A
* Simple yet Effective Layout Token in Large Language Models for Document Understanding, A
* Simpler Diffusion: 1.5 FID on ImageNet512 with pixel-space diffusion
* Simplification Is All You Need against Out-of-Distribution Overconfidence
* Simplified Concrete Dropout - Improving the Generation of Attribution Masks for Fine-grained Classification
* Simulator HC: Regression-based Online Simulation of Starting Problem-Solution Pairs for Homotopy Continuation in Geometric Vision
* SimVS: Simulating World Inconsistencies for Robust View Synthesis
* SimZSL: Zero-Shot Learning Beyond a Pre-defined Semantic Embedding Space
* Single Domain Generalization for Few-Shot Counting via Universal Representation Matching
* SinGS: Animatable Single-Image Human Gaussian Splats with Kinematic Priors
* SINR: Sparsity Driven Compressed Implicit Neural Representations
* SIR-DIFF: Sparse Image Sets Restoration with Multi-View Diffusion Model
* Six-CD: Benchmarking Concept Removals for Text-to-image Diffusion Models
* SKDream: Controllable Multi-view and 3D Generation with Arbitrary Skeletons
* SKE-Layout: Spatial Knowledge Enhanced Layout Generation with LLMs
* Sketch Down the FLOPs: Towards Efficient Networks for Human Sketch
* SketchAgent: Language-Driven Sequential Sketch Generation
* SketchFusion: Learning Universal Sketch Features through Fusing Foundation Models
* Sketchtopia: A Dataset and Foundational Agents for Benchmarking Asynchronous Multimodal Communication with Iconic Feedback
* SketchVideo: Sketch-based Video Generation and Editing
* Sketchy Bounding-Box Supervision for 3D Instance Segmentation
* SkillMimic: Learning Basketball Interaction Skills from Demonstrations
* Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves
* SkySense-O: Towards Open-World Remote Sensing Interpretation with Vision-Centric Visual-Language Modeling
* SLADE: Shielding against Dual Exploits in Large Vision-Language Models
* SLAM3R: Real-Time Dense Scene Reconstruction from Monocular RGB Videos
* SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models
* SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding
* SLVR: Super-Light Visual Reconstruction via Blueprint Controllable Convolutions and Exploring Feature Diversity Representation
* SMA-YOLO: An Improved YOLOv8 Algorithm Based on Parameter-Free Attention Mechanism and Multi-Scale Feature Fusion for Small Object Detection in UAV Images
* Small but Mighty: A Lightweight Feature Enhancement Strategy for LiDAR Odometry in Challenging Environments
* Small Ship Detection Based on Improved Neural Network Algorithm and SAR Images
* SmartCLIP: Modular Vision-language Alignment with Identification Guarantees
* SmartEraser: Remove Anything from Images using Masked-Region Guidance
* SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning
* SMTPD: A New Benchmark for Temporal Prediction of Social Media Popularity
* SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device
* SnapGen: Taming High-Resolution Text-To-Image Models for Mobile Devices with Efficient Architectures and Training
* Snow Depth Retrieval Using Sentinel-1 Radar Data: A Comparative Analysis of Random Forest and Support Vector Machine Models with Simulated Annealing Optimization
* SnowMaster: Comprehensive Real-world Image Desnowing via MLLM with Multi-Model Feedback Optimization
* SOAP: Vision-Centric 3D Semantic Scene Completion with Scene-Adaptive Decoder and Occluded Region-Aware View Projection
* SocialGesture: Delving into Multi-person Gesture Understanding
* Socially-Compliant Hierarchical Human-Vehicle Collaboration With Multimodal Haptic Steering
* SocialMOIF: Multi-Order Intention Fusion for Pedestrian Trajectory Prediction
* Soft Self-labeling and Potts Relaxations for Weakly-Supervised Segmentation
* SoftShadow: Leveraging Soft Masks for Penumbra-Aware Shadow Removal
* SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer
* SOGS: Second-Order Anchor for Advanced 3D Gaussian Splatting
* Soil Moisture Prediction Using Remote Sensing and Machine Learning Algorithms: A Review on Progress, Challenges, and Opportunities
* Soil Moisture Prediction Using the VIC Model Coupled with LSTMseq2seq
* Soil Moisture-Informed Seismic Landslide Model Using SMAP Satellite Data, A
* Soil Organic Matter (SOM) Mapping in Subtropical Coastal Mountainous Areas Using Multi-Temporal Remote Sensing and the FOI-XGB Model
* SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters
* Solar cities: Multiple-reflection within urban canyons
* SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving
* Solving Instance Detection from an Open-World Perspective
* SoMA: Singular Value Decomposed Minor Components Adaptation for Domain Generalizable Representation Learning
* SonarNet: Global Feature-Based Hybrid Attention Network for Side-Scan Sonar Image Segmentation
* Sonata: Self-Supervised Learning of Reliable Point Representations
* Sonic: Shifting Focus to Global Audio Perception in Portrait Animation
* Sound Bridge: Associating Egocentric and Exocentric Videos via Audio Cues
* SoundVista: Novel-View Ambient Sound Synthesis via Visual-Acoustic Binding
* SP3D: Boosting Sparsely-Supervised 3D Object Detection via Accurate Cross-Modal Semantic Prompts
* SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Models
* SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images
* SPARC: Score Prompting and Adaptive Fusion for Zero-Shot Multi-Label Recognition in Vision-Language Models
* SPARS3R: Semantic Prior Alignment and Regularization for Sparse 3D Reconstruction
* Sparse Decomposition-Based Anti-Spoofing Framework for GNSS Receiver: Spoofing Detection, Classification, and Position Recovery
* Sparse LaneFormer: End-to-End Lane Detection With Sparse Proposals and Interactions
* Sparse Point Cloud Patches Rendering via Splitting 2D Gaussians
* Sparse Voxels Rasterization: Real-time High-fidelity Radiance Field Rendering
* Sparse2DGS: Geometry-Prioritized Gaussian Splatting for Surface Reconstruction from Sparse Views
* SparseAlign: A Fully Sparse Framework for Cooperative Object Detection
* Spatial and Spectral Structure-Aware Mamba Network for Hyperspectral Image Classification
* Spatial and Temporal Expansion of Photovoltaic Sites and Thermal Environmental Effects in Ningxia Based on Remote Sensing and Deep Learning
* Spatial Downscaling of GRACE Groundwater Storage Based on DTW Distance Clustering and an Analysis of Its Driving Factors
* Spatial Downscaling of Sea Level Anomaly Using a Deep Separable Distillation Network
* Spatial Evolution and Driving Mechanisms of Vegetation Cover in China's Greater Khingan Mountains Based on Explainable Geospatial Machine Learning
* Spatial Re-Parameterization for N:M Sparsity
* Spatial Transport Optimization by Repositioning Attention Map for Training-Free Text-to-Image Synthesis
* Spatial-spectral adaptive generalization driven high-precision BRDF modeling for extensible areas using UAV-Borne remote sensing
* Spatial-Spectral Feature Fusion and Spectral Reconstruction of Multispectral LiDAR Point Clouds by Attention Mechanism
* Spatial-Temporal Graph Diffusion Policy with Kinematic Modeling for Bimanual Robotic Manipulation
* Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models
* SpatialCLIP: Learning 3D-aware Image Representations from Spatially Discriminative Language
* SpatialDreamer: Self-supervised Stereo Video Synthesis from Monocular Input
* SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
* Spatio-Temporal Characteristics of the Morphological Development of Gully Erosion on the Chinese Loess Plateau
* Spatiotemporal Analysis of Drought Variation from 2001 to 2023 in the China-Mongolia-Russia Transboundary Heilongjiang River Basin Based on ITVDI
* Spatiotemporal Analysis of Eco-Geological Environment Using the RAGA-PP Model in Zigui County, China
* Spatiotemporal Characteristics of and Factors Influencing CO2 Concentration During 2010-2023 in China
* Spatiotemporal Characteristics of Ground Subsidence in Xiong'an New Area Revealed by a Combined Observation Framework Based on InSAR and GNSS Techniques
* Spatiotemporal Decoupling for Efficient Vision-Based Occupancy Forecasting
* Spatiotemporal Dynamics and Driving Factors of Soil Wind Erosion in Inner Mongolia, China
* Spatiotemporal Evolution of Precipitation Concentration in the Yangtze River Basin (1960-2019): Associations with Extreme Heavy Precipitation and Validation Using GPM IMERG
* Spatiotemporal Mapping and Driving Mechanism of Crop Planting Patterns on the Jianghan Plain Based on Multisource Remote Sensing Fusion and Sample Migration
* Spatiotemporal Mapping of Soil Profile Moisture in Oases in Arid Areas
* Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling
* Spatiotemporal Super-Resolution of Satellite Sea Surface Salinity Based on a Progressive Transfer Learning-Enhanced Transformer
* Spatiotemporal Variability of Cloud Parameters and Their Climatic Impacts over Central Asia Based on Multi-Source Satellite and ERA5 Data
* Spatiotemporal Variation in Fractional Vegetation Coverage and Quantitative Analysis of Its Driving Forces: A Case Study in the Tabu River Basin, Northern China, 1986-2023
* SPC-GS: Gaussian Splatting with Semantic-Prompt Consistency for Indoor Open-World Free-view Synthesis from Sparse Inputs
* Spectral Informed Mamba for Robust Point Cloud Processing
* Spectral State Space Model for Rotation-Invariant Visual Representation Learning
* SpectralGaussians: Semantic, spectral 3D Gaussian splatting for multi-spectral scene representation, visualization and analysis
* SpecTRe-GS: Modeling Highly Specular Surfaces with Reflected Nearby Objects by Tracing Rays in 3D Gaussian Splatting
* SpectroMotion: Dynamic 3D Reconstruction of Specular Scenes
* Speedy-Splat: Fast 3D Gaussian Splatting with Sparse Pixels and Sparse Primitives
* SphereUFormer: A U-Shaped Transformer for Spherical 360 Perception
* Spherical Manifold Guided Diffusion Model for Panoramic Image Generation
* Spiking Transformer with Spatial-Temporal Attention
* Spiking Transformer: Introducing Accurate Addition-Only Spiking Self-Attention for Transformer
* SpiritSight Agent: Advanced GUI Agent with One Look
* Spk2SRImgNet: Super-Resolve Dynamic Scene from Spike Stream via Motion Aligned Collaborative Filtering
* SplatAD: Real-Time Lidar and Camera Rendering with 3D Gaussian Splatting for Autonomous Driving
* SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis
* SplatFlow: Self-Supervised Dynamic Gaussian Splatting in Neural Motion Flow Field for Autonomous Driving
* Splatter-360: Generalizable 360° Gaussian Splatting for Wide-baseline Panoramic Images
* SplineGS: Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians from Monocular Video
* Split Adaptation for Pre-trained Vision Transformers
* SPMTrack: Spatio-Temporal Parameter-Efficient Fine-Tuning with Mixture of Experts for Scalable Visual Tracking
* Spotting the Unexpected (STU): A 3D LiDAR Dataset for Anomaly Segmentation in Autonomous Driving
* SQLNet: Scale-Modulated Query and Localization Network for Few-Shot Class-Agnostic Counting
* SR-YOLO: Spatial-to-Depth Enhanced Multi-Scale Attention Network for Small Target Detection in UAV Aerial Imagery
* SRW-YOLO: A Detection Model for Environmental Risk Factors During the Grid Construction Phase
* SSHNet: Unsupervised Cross-modal Homography Estimation via Problem Reformulation and Split Optimization
* ST-CFNet: Spatio-Temporal Cross-Feature Fusion Networks for 3D Human Pose Estimation
* STAA-SNN: Spatial-Temporal Attention Aggregator for Spiking Neural Networks
* Stabilizing and Accelerating Autofocus with Expert Trajectory Regularized Deep Reinforcement Learning
* Stable Flow: Vital Layers for Training-Free Image Editing
* Stable-SCore: A Stable Registration-based Framework for 3D Shape Correspondence
* StableAnimator: High-Quality Identity-Preserving Human Image Animation
* StabStitch++: Unsupervised Online Video Stitching With Spatiotemporal Bidirectional Warps
* Stacking Brick by Brick: Aligned Feature Isolation for Incremental Face Forgery Detection
* STADe: Sensory Temporal Action Detection via Temporal-Spectral Representation Learning
* StageDesigner: Artistic Stage Generation for Scenography via Theater Scripts
* Standard Classes for Urban Topographic Mapping with ALS: Classification Scheme and a First Implementation
* STANet-TLA: leveraging deep learning and prior knowledge for large-scale soybean breeding plot segmentation and high-yielding variety screening from UAV time-series data
* Star with Bilinear Mapping
* STAR-Edge: Structure-aware Local Spherical Curve Representation for Thin-walled Edge Extraction from Unstructured Point Clouds
* StarGen: A Spatiotemporal Autoregression Framework with Video Diffusion Model for Scalable and Controllable Scene Generation
* StarVector: Generating Scalable Vector Graphics Code from Images and Text
* Statistical Approach to Research on the Relationship Between Kp/Dst Geomagnetic Indices and Total GPS Position Error
* STAU: A SpatioTemporal-Aware Unit for Video Prediction and Beyond
* STCOcc: Sparse Spatial-Temporal Cascade Renovation for 3D Occupancy and Scene Flow Prediction
* STDD: Spatio-Temporal Dual Diffusion for Video Generation
* StdGEN: Semantic-Decomposed 3D Character Generation from Single Images
* Steady Progress Beats Stagnation: Mutual Aid of Foundation and Conventional Models in Mixed Domain Semi-Supervised Medical Image Segmentation
* Stealthy Backdoor Attack in Self-Supervised Learning Vision Encoders for Large Vision Language Models
* Steepest Descent Density Control for Compact 3D Gaussian Splatting
* Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreaks
* STEP: Enhancing Video-LLMs' Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training
* STEPS: Sequential Probability Tensor Estimation for Text-to-Image Hard Prompt Search
* Stepwise Building Damage Estimation Through Time-Scaled Multi-Sensor Integration: A Case Study of the 2024 Noto Peninsula Earthquake
* Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail
* Stereo4D: Learning How Things Move in 3D from Internet Stereo Videos
* STEREO: A Two-Stage Framework for Adversarially Robust Concept Erasing from Text-to-Image Diffusion Models
* STHVC: Spatial-Temporal Hybrid Video Compression for UAV-Assisted IoV Systems
* StickMotion: Generating 3D Human Motions by Drawing a Stickman
* STiL: Semi-supervised Tabular-Image Learning for Comprehensive Task-Relevant Information Exploration in Multimodal Classification
* STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection
* STINR: Deciphering Spatial Transcriptomics via Implicit Neural Representation
* Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs, A
* Stochastic Human Motion Prediction with Memory of Action Transition and Action Characteristic
* Stop learning it all to mitigate visual hallucination, Focus on the hallucination target
* Stop Walking in Circles! Bailing Out Early in Projected Gradient Descent
* STOP: Integrated Spatial-Temporal Dynamic Prompting for Video Understanding
* StoryGPT-V: Large Language Models as Consistent Story Visualizers
* STPro: Spatial and Temporal Progressive Learning for Weakly Supervised Spatio-Temporal Grounding
* Strategies for Soil Salinity Mapping Using Remote Sensing and Machine Learning in the Yellow River Delta
* StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
* StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models
* Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget
* Striving for Faster and Better: A One-Layer Architecture With Auto Re-Parameterization for Low-Light Image Enhancement
* Structure Causal Models and LLMs Integration in Medical Visual Question Answering
* Structure from Collision
* Structure-Aware Correspondence Learning for Relative Pose Estimation
* Structure-From-Motion with a Non-Parametric Camera Model
* Structured 3D Latents for Scalable and Versatile 3D Generation
* Study of Antarctic Sea Ice Based on Shipborne Camera Images and Deep Learning Method
* Study of the Characteristics of a Co-Seismic Displacement Field Based on High-Resolution Stereo Imagery: A Case Study of the 2024 MS7.1 Wushi Earthquake, Xinjiang
* Study on Correction Methods for GPM Rainfall Rate and Radar Reflectivity Using Ground-Based Raindrop Spectrometer Data
* Study on Lithospheric Tectonic Features of Tianshan and Adjacent Regions and the Genesis Mechanism of the Wushi Ms7.1 Earthquake
* Study on the Dynamic Changes in Land Cover and Their Impact on Carbon Stocks in Karst Mountain Areas: A Case Study of Guiyang City
* Study on the Effect of Shortwave Radiation in Land Surface Temperature Downscaling over Rugged Terrain
* Style Evolving along Chain-of-Thought for Unknown-Domain Object Detection
* Style Quantization for Data-Efficient GAN Training
* Style-Editor: Text-driven object-centric style editing
* StyleMaster: Stylize Your Video with Artistic Generation and Translation
* StyleSSP: Sampling StartPoint Enhancement for Training-free Diffusion-based Method for Style Transfer
* StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements
* Subauroral and Auroral Conditions in the Mid- and Low-Midlatitude Ionosphere over Europe During the May 2024 Mother's Day Superstorm
* Subjective and Objective Quality Assessment of Banding Artifacts on Compressed Videos
* Submarine Topography Classification Using ConDenseNet with Label Smoothing Regularization
* Subnet-Aware Dynamic Supernet Training for Neural Architecture Search
* Subspace Constraint and Contribution Estimation for Heterogeneous Federated Learning
* Sufficient Invariant Learning for Distribution Shift
* SUM Parts: Benchmarking Part-Level Semantic Segmentation of Urban Meshes
* SuperLightNet: Lightweight Parameter Aggregation Network for Multimodal Brain Tumor Segmentation
* SuperPC: A Single Diffusion Model for Point Cloud Completion, Upsampling, Denoising, and Colorization
* Supervising Sound Localization by In-the-wild Egomotion
* Supervision by Denoising
* Supplementary Prompt Learning for Vision-Language Models
* Surface Damage Detection in Hydraulic Structures from UAV Images Using Lightweight Neural Networks
* Surface Reconstruction Planning with High-Quality Satellite Stereo Pairs Searching
* Surgeon: Memory-Adaptive Fully Test-Time Adaptation via Dynamic Activation Sparsity
* Survey of Representation Learning, Optimization Strategies, and Applications for Omnidirectional Vision, A
* SVDC: Consistent Direct Time-of-Flight Video Depth Completion with Frequency Selective Fusion
* SVFR: A Unified Framework for Generalized Video Face Restoration
* SVG-IR: Spatially-Varying Gaussian Splatting for Inverse Rendering
* SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video Situation
* SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion
* Swin-FSNet: A Frequency-Aware and Spatially Enhanced Network for Unpaved Road Extraction from UAV Remote Sensing Imagery
* SwinTextSpotter v2: Towards Better Synergy for Scene Text Spotting
* Symbolic Representation for Any-to-Any Generative Tasks
* SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization
* Symmetry Strikes Back: From Single-Image Symmetry Detection to 3D Generation
* Synchronized Video-to-Audio Generation via Mel Quantization-Continuum Decomposition
* SyncSDE: A Probabilistic Framework for Diffusion Synchronization
* SyncVP: Joint Diffusion for Synchronous Multi-Modal Video Prediction
* SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding
* Synergistic Multi-Model Approach for GPR Data Interpretation: Forward Modeling and Robust Object Detection
* Synergizing Motion and Appearance: Multi-Scale Compensatory Codebooks for Talking Head Video Generation
* SynTab-LLaVA: Enhancing Multimodal Table Understanding with Decoupled Synthesis
* Synthetic Data is an Elegant GIFT for Continual Vision-Language Models
* Synthetic Prior for Few-Shot Drivable Head Avatar Inversion
* Synthetic Visual Genome
* Synthetic-to-Real Self-supervised Robust Depth Estimation via Learning with Motion and Structure Priors
* SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces
* S²KAN-SLAM: Elastic Neural LiDAR SLAM With SDF Submaps and Kolmogorov-Arnold Networks
* T-CIL: Temperature Scaling using Adversarial Perturbation for Calibration in Class-Incremental Learning
* T-FAKE: Synthesizing Thermal Images for Facial Landmarking
* T-STAR: Time-Optimal Swarm Trajectory Planning for Quadrotor Unmanned Aerial Vehicles
* T2ICount: Enhancing Cross-Modal Understanding for Zero-Shot Counting
* T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation
* T2SG: Traffic Topology Scene Graph for Topology Reasoning in Autonomous Driving
* T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation
* TacoDepth: Towards Efficient Radar-Camera Depth Estimation with One-stage Fusion
* TADFormer: Task-Adaptive Dynamic TransFormer for Efficient Multi-Task Learning
* TAET: Two-Stage Adversarial Equalization Training on Long-Tailed Distributions
* TAGA: Self-supervised Learning for Template-free Animatable Gaussian Articulated Model
* TailedCore: Few-Shot Sampling for Unsupervised Long-Tail Noisy Anomaly Detection
* Tailoring Energy Efficiency for Urban Electric Buses: The GTECM Model for Enhanced Range and Sustainable Operation Using Real-Time Big Data
* Take the Bull by the Horns: Learning to Segment Hard Samples
* Tale of Two Classes: Adapting Supervised Contrastive Learning to Binary Imbalanced Datasets, A
* TAM-TR: Text-guided attention multi-modal transformer for object detection in UAV images
* Taming Teacher Forcing for Masked Autoregressive Video Generation
* Taming Video Diffusion Prior with Scene-Grounding Guidance for 3D Gaussian Splatting from Sparse Inputs
* TAMT: Temporal-Aware Model Tuning for Cross-Domain Few-Shot Action Recognition
* TANGO: Training-free Embodied AI Agents for Open-world Tasks
* TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting
* TAPT: Test-Time Adversarial Prompt Tuning for Robust Inference in Vision-Language Models
* TARA 2.0 for Connected and Automated Vehicles
* Targeted Forgetting of Image Subgroups in CLIP Models
* TAROT: Towards Essentially Domain-Invariant Robustness with Theoretical Justification
* Tartan IMU: A Light Foundation Model for Inertial Positioning in Robotics
* Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
* Task Singular Vectors: Reducing Task Interference in Model Merging
* Task-Agnostic Guided Feature Expansion for Class-Incremental Learning
* Task-Aware Attentional Dynamic Alignment for Few-Shot Compressed Video Classification
* Task-Aware Clustering for Prompting Vision-Language Models
* Task-aware Cross-modal Feature Refinement Transformer with Large Language Models for Visual Grounding
* Task-driven Image Fusion with Learnable Fusion Loss
* Task-Specific Gradient Adaptation for Few-Shot One-Class Classification
* Taste More, Taste Better: Diverse Data and Strong Model Boost Semi-Supervised Crowd Counting
* TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation
* Taxonomy-Aware Evaluation of Vision-Language Models
* Taylor-Series-Expansion-Based Vision Transformer Models
* TCFG: Tangential Damping Classifier-free Guidance
* Teaching in adverse scenes: a statistically feedback-driven threshold and mask adjustment teacher-student framework for object detection in UAV images under adverse scenes
* Teaching Large Language Models to Regress Accurate Image Quality Scores Using Score Distribution
* Technical Note on AI-Driven Archaeological Object Detection in Airborne LiDAR Derivative Data, with CNN as the Leading Technique, A
* Tectonic Rift-Related Manganese Mineralization System and Its Geophysical Signature in the Nanpanjiang Basin
* Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation
* Temporal Action Detection Model Compression by Progressive Block Drop
* Temporal Alignment-Free Video Matching for Few-Shot Action Recognition
* Temporal and Spatial Perception: A Novel Perceptual Rate-Distortion Optimization Method for H.266/VVC Encoding
* Temporal Score Analysis for Understanding and Correcting Diffusion Artifacts
* Temporal Separation with Entropy Regularization for Knowledge Distillation in Spiking Neural Networks
* Temporally Consistent Object-Centric Learning by Contrasting Slots
* TensoFlow: Tensorial Flow-Based Sampler for Inverse Rendering
* Terrain and Atmosphere Classification Framework on Satellite Data Through Attentional Feature Fusion Network
* Terrain-Constrained Cross-Correlation Matching Method for Laser Footprint Geolocation, A
* Test-time augmentation improves efficiency in conformal prediction
* Test-Time Backdoor Detection for Object Detection Models
* Test-Time Domain Generalization via Universe Learning: A Multi-Graph Matching Approach for Medical Image Segmentation
* Test-Time Fine-Tuning of Image Compression Models for Multi-Task Adaptability
* Test-Time Training for Hyperspectral Image Super-Resolution
* Test-Time Visual In-Context Tuning
* TexGarment: Consistent Garment UV Texture Generation via Efficient 3D Structure-Guided Diffusion Transformer
* TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting
* Text Augmented Correlation Transformer For Few-shot Classification & Segmentation
* Text Embedding is Not All You Need: Attention Control for Text-to-Image Semantic Alignment with Text Self-Attention Maps
* Text to Image for Multi-Label Image Recognition With Joint Prompt-Adapter Learning
* Text-Driven Fashion Image Editing with Compositional Concept Learning and Counterfactual Abduction
* Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding
* Textured Gaussians for Enhanced 3D Scene Appearance Modeling
* Textureless Deformable Object Tracking With Invisible Markers
* TFCustom: Customized Image Generation with Time-Aware Frequency Feature Guidance
* TGAvatar: Reconstructing 3D Gaussian Avatars With Transformer-Based Tri-Plane
* Theoretical Insights in Model Inversion Robustness and Conditional Entropy Maximization for Collaborative Inference Systems
* Theory of Learning Unified Model via Knowledge Integration from Label Space Varying Domains, A
* Theory-Inspired Deep Multi-View Multi-Label Learning with Incomplete Views and Noisy Labels
* Thermal Multi-Sensor Assessment of the Spatial Sampling Behavior of Urban Landscapes Using 2D Turbulence Indicators
* Thin-Shell-SfT: Fine-Grained Monocular Non-rigid 3D Surface Tracking with Neural Deformation Fields
* Think Small, Act Big: Primitive Prompt Learning for Lifelong Robot Manipulation
* Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
* Three Cars Approaching within 100m! Enhancing Distant Geometry by Tri-Axis Voxel Scanning for Camera-based Semantic Scene Completion
* Three-view Focal Length Recovery From Homographies
* Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation
* TIDE: Training Locally Interpretable Domain Generalization Models Enables Test-time Correction
* Tightening Robustness Verification of MaxPool-based Neural Networks via Minimizing the Over-Approximation Zone
* Tiled Diffusion
* TIMA-Net: A Lightweight Remote Sensing Image Change Detection Network Based on Temporal Interaction Enhancement and Multi-Scale Aggregation
* Time of the Flight of the Gaussians: Optimizing Depth Indirectly in Dynamic Radiance Fields
* Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
* TimeTracker: Event-based Continuous Point Tracking for Video Frame Interpolation with Non-linear Motion
* TIMotion: Temporal and Interactive Framework for Efficient Human-Human Motion Generation
* Tiny object detection with single point supervision
* TinyFusion: Diffusion Transformers Learned Shallow
* TKG-DM: Training-free Chroma Key Content Generation Diffusion Model
* TMTS: A Physics-Based Turbulence Mitigation Network Guided by Turbulence Signatures for Satellite Video
* Token Cropr: Faster ViTs for Quite a Few Tasks
* TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation
* TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization
* Tokenize Image Patches: Global Context Fusion for Effective Haze Removal in Large Images
* Tokenized Generative Speech Enhancement With Language Model and Flow Matching
* TokenMotion: Decoupled Motion Control via Token Disentanglement for Human-centric Video Generation
* TopNet: Transformer-Efficient Occupancy Prediction Network for Octree-Structured Point Cloud Geometry Compression
* TopoCellGen: Generating Histopathology Cell Topology with a Diffusion Model
* TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model
* TORA: Trajectory-oriented Diffusion Transformer for Video Generation
* Total solution for simultaneous pose and correspondence estimation of drone images in urban environments
* Touch2Shape: Touch-Conditioned 3D Diffusion for Shape Exploration and Reconstruction
* Toward a Holistic Evaluation of Robustness in CLIP Models
* Toward a Universal, Transferable, and Robust Adversarial Perturbation Framework Against Deep Hashing-Based Facial Image Retrieval
* Toward Adaptive and Coordinated Transportation Systems: A Multi-Personality Multi-Agent Meta-Reinforcement Learning Framework
* Toward Generalized Image Quality Assessment: Relaxing the Perfect Reference Quality Assumption
* Toward Interactive Sound Source Localization: Better Align Sight and Sound!
* Toward Real-world BEV Perception: Depth Uncertainty Estimation via Gaussian Splatting
* Toward Robust Neural Reconstruction from Sparse Point Sets
* Toward Understanding the Generalizability of Delayed Stochastic Gradient Descent
* Toward Unified 3D Object Detection via Algorithm and Data Unification
* Toward Unifying Saliency Transformer for Video Saliency Prediction and Detection
* Towards a Universal Synthetic Video Detector: From Face or Background Manipulations to Fully AI-Generated Content
* Towards All-in-One Medical Image Re-Identification
* Towards Autonomous Micromobility through Scalable Urban Simulation
* Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards
* Towards Consistent Multi-Task Learning: Unlocking the Potential of Task-Specific Parameters
* Towards Continual Universal Segmentation
* Towards Cost-Effective Learning: A Synergy of Semi-Supervised and Active Learning
* Towards Effective and Sparse Adversarial Attack on Spiking Neural Networks via Breaking Invisible Surrogate Gradients
* Towards Efficient Foundation Model for Zero-shot Amodal Segmentation
* Towards Efficient SAR Ship Detection: Multi-Level Feature Fusion and Lightweight Network Design
* Towards Enhanced Image Inpainting: Mitigating Unwanted Object Insertion and Preserving Color Consistency
* Towards Explainable and Unprecedented Accuracy in Matching Challenging Finger Crease Patterns
* Towards Explicit Geometry-Reflectance Collaboration for Generalized LiDAR Segmentation in Adverse Weather
* Towards Fine-Grained Interpretability: Counterfactual Explanations for Misclassification with Saliency Partition
* Towards General Visual-Linguistic Face Forgery Detection
* Towards Generalizable Scene Change Detection
* Towards Generalizable Trajectory Prediction using Dual-Level Representation Learning and Adaptive Prompting
* Towards High-fidelity 3D Talking Avatar with Personalized Dynamic Texture
* Towards Human-Understandable Multi-Dimensional Concept Discovery
* Towards Improved Text-Aligned Codebook Learning: Multi-Hierarchical Codebook-Text Alignment with Long Text
* Towards In-the-wild 3D Plane Reconstruction from a Single Image
* Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method
* Towards Lossless Implicit Neural Representation via Bit Plane Decomposition
* Towards Million-Scale Adversarial Robustness Evaluation With Stronger Individual Attacks
* Towards More General Video-based Deepfake Detection through Facial Component Guided Adaptation for Foundation Model
* Towards Natural Language-Based Document Image Retrieval: New Dataset and Benchmark
* Towards Open-Vocabulary Audio-Visual Event Localization
* Towards Optimizing Large-Scale Multi-Graph Matching in Bioimaging
* Towards Practical Real-Time Neural Video Compression
* Towards Precise Embodied Dialogue Localization via Causality Guided Diffusion
* Towards Precise Scaling Laws for Video Diffusion Transformers
* Towards RAW Object Detection in Diverse Conditions
* Towards Realistic Example-based Modeling via 3D Gaussian Stitching
* Towards Satellite Image Road Graph Extraction: A Global-Scale Dataset and A Novel Method
* Towards Scalable Human-aligned Benchmark for Text-guided Image Editing
* Towards Smart Point-and-Shoot Photography
* Towards Source-Free Machine Unlearning
* Towards Stable and Storage-efficient Dataset Distillation: Matching Convexified Trajectory
* Towards Training-Free Anomaly Detection with Vision and Language Foundation Models
* Towards Transformer-Based Aligned Generation with Self-Coherence Guidance
* Towards Unbiased and Robust Spatio-Temporal Scene Graph Generation and Anticipation
* Towards Understanding and Quantifying Uncertainty for Text-to-Image Generation
* Towards Understanding How Knowledge Evolves in Large Vision-Language Models
* Towards Universal AI-Generated Image Detection by Variational Information Bottleneck Network
* Towards Universal Dataset Distillation via Task-Driven Diffusion
* Towards Universal Soccer Video Understanding
* Towards Visual Discrimination and Reasoning of Real-World Physical Dynamics: Physics-Grounded Anomaly Detection
* Towards Zero-Shot Anomaly Detection and Reasoning with Multimodal Large Language Models
* Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning
* Track Any Anomalous Object: A Granular Video Anomaly Detection Pipeline
* Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation
* Tracking Lava Flow Cooling from Space: Implications for Erupted Volume Estimation and Cooling Mechanisms
* Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better
* TraF-Align: Trajectory-aware Feature Alignment for Asynchronous Multi-agent Perception
* Training Data Provenance Verification: Did Your Model Use Synthetic Data from My Generative Model for Training?
* Training-free Dense-Aligned Diffusion Guidance for Modular Conditional Image Synthesis
* Training-free Neural Architecture Search through Variance of Knowledge of Deep Network Weights
* Trajectory Mamba: Efficient Attention-Mamba Forecasting Model Based on Selective SSM
* Transfer Learning Based on Multi-Branch Architecture Feature Extractor for Airborne LiDAR Point Cloud Semantic Segmentation with Few Samples
* Transfer Your Perspective: Controllable 3D Generation from Any Viewpoint in a Driving Scene
* Transferring Prior Thermal Knowledge for Snowy Urban Scene Semantic Segmentation
* Transfontanelle Thermoacoustic Imaging of Intraventricular Brain Hemorrhages in Live Sheep
* Transformers without Normalization
* TransPixeler: Advancing Text-to-Video Generation with Transparency
* Traversing Distortion-Perception Tradeoff using a Single Score-Based Generative Model
* TreeMeshGPT: Artistic Mesh Generation with Autoregressive Tree Sequencing
* Tripartite Weight-Space Ensemble for Few-Shot Class-Incremental Learning
* Triple Dynamic Graph Convolutional Recurrent Network for Traffic Prediction
* TriTex: Learning Texture from a Single Mesh via Triplane Semantic Features
* Trust Model-Based Consensus Optimization for Vehicle Platooning Networks: A Novel Deep Reinforcement Learning Approach With GenAI
* TSAM: Temporal SAM Augmented with Multimodal Prompts for Referring Audio-Visual Segmentation
* TSD-SR: One-Step Diffusion with Target Score Distillation for Real-World Image Super-Resolution
* TSP-Mamba: The Travelling Salesman Problem Meets Mamba for Image Super-resolution and Beyond
* Tuning the Frequencies: Robust Training for Sinusoidal Neural Networks
* Turbo3D: Ultra-fast Text-to-3D Generation
* TurboFill: Adapting Few-step Text-to-image Model for Fast Image Inpainting
* TVG: A Training-Free Transition Video Generation Method With Diffusion Models
* Twinner: Shining Light on Digital Twins in a Few Snaps
* Two by Two: Learning Multi-Task Pairwise Objects Assembly for Generalizable Robot Manipulation
* Two is Better than One: Efficient Ensemble Defense for Robust and Compact Models
* Two-Stage Physical-Informed Neural Network Approach for High-Speed Railway Track Geometry Irregularity Maintenance, A
* Type-R: Automatically Retouching Typos for Text-to-Image Generation
* U-Know-DiffPAN: An Uncertainty-aware Knowledge Distillation Diffusion Framework with Details Enhancement for PAN-Sharpening
* UA-Pose: Uncertainty-Aware 6D Object Pose Estimation and Online Object Completion with Partial References
* UAS Remote Sensing for Coastal Wetland Vegetation Biomass Estimation: A Destructive vs. Non-Destructive Sampling Experiment
* UAV-Based Multi-Scenario RGB-Thermal Dataset and Fusion Model for Enhanced Forest Fire Detection, A
* UAV-Satellite Cross-View Image Matching Based on Adaptive Threshold-Guided Ring Partitioning Framework
* UCM-VeID V2: A Richer Dataset and A Pre-Training Method for UAV Cross-Modality Vehicle Re-Identification
* UCOD-DPL: Unsupervised Camouflaged Object Detection via Dynamic Pseudo-label Learning
* UHD-processer: Unified UHD Image Restoration with Progressive Frequency Learning and Degradation-aware Prompts
* UIBDiffusion: Universal Imperceptible Backdoor Attack for Diffusion Models
* UltraFusion: Ultra High Dynamic Imaging using Exposure Fusion
* UMD-Net: A Unified Multi-Task Assistive Driving Network Based on Multimodal Fusion
* UMFN: Unified Multi-Domain Face Normalization for Joint Cross-domain Prototype Learning and Heterogeneous Face Recognition
* UMotion: Uncertainty-driven Human Motion Estimation from Inertial and Ultra-wideband Units
* Unbiased Video Scene Graph Generation via Visual and Semantic Dual Debiasing
* Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video Benchmarks
* Unboxed: Geometrically and Temporally Consistent Video Outpainting
* Uncertain Multimodal Intention and Emotion Understanding in the Wild
* Uncertainty Meets Diversity: A Comprehensive Active Learning Framework for Indoor 3D Object Detection
* Uncertainty Neural Surfaces for Space Target 3D Reconstruction Under Constrained Views
* Uncertainty Propagation From Projections to Region Counts in Tomographic Imaging: Application to Radiopharmaceutical Dosimetry
* Uncertainty Quantification and Quality Control for Heatmap-Based Landmark Detection Models
* Uncertainty Quantification Using Variance Inference Ensemble Network for Object Detection
* Uncertainty Weighted Gradients for Model Calibration
* Uncertainty-Aware Safe Trajectory Planner Based on Model Predictive Control for Autonomous Driving
* Uncertainty-guided Perturbation for Image Super-Resolution Diffusion Model
* Uncertainty-Instructed Structure Injection for Generalizable HD Map Construction
* UnCommon Objects in 3D
* Underground Goaf Locating Framework Based on D-InSAR with Three Different Prior Geological Information Conditions, An
* Understanding Fine-tuning CLIP for Open-vocabulary Semantic Segmentation in Hyperbolic Space
* Understanding multi-layered transmission matrices
* Understanding Multi-Task Activities from Single-Task Videos
* Underwater Image Enhancement via Wavelet Decomposition Fusion of Advantage Contrast
* UNeLF: Unconstrained Neural Light Field for Self-Supervised Angular Super-Resolution
* UNEM: UNrolled Generalized EM for Transductive Few-Shot Learning
* Uni-Renderer: Unifying Rendering and Inverse Rendering Via Dual Stream Diffusion
* Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video
* UniAlign: Scaling Multimodal Alignment within One Unified Model
* UniAP: Unifying Inter- and Intra-Layer Automatic Parallelism by Mixed Integer Quadratic Programming
* UNIC-Adapter: Unified Image-Instruction Adapter with Multi-Modal Transformer for Image Generation
* UNICL-SAM: Uncertainty-Driven In-Context Segmentation with Part Prototype Discovery
* UniEmoX: Cross-Modal Semantic-Guided Large-Scale Pretraining for Universal Scene Emotion Perception
* Unified Approach to Interpreting Self-supervised Pre-training Methods for 3D Point Clouds via Interactions, A
* Unified Dense Prediction of Video Diffusion
* Unified Framework for Heterogeneous Semi-supervised Learning, A
* Unified Image-Dense Annotation Generation Model for Underwater Scenes, A
* Unified Latent Schrödinger Bridge Diffusion Model for Unsupervised Anomaly Detection and Localization, A
* Unified Medical Lesion Segmentation via Self-referring Indicator
* Unified Model for Compressed Sensing MRI Across Undersampling Patterns, A
* Unified Open Adapter for Open-World Noisy Label Learning: Data-Centric and Learning-Based Insights, A
* Unified Reconstruction of Static and Dynamic Scenes from Events
* Unified Uncertainty-Aware Diffusion for Multi-Agent Trajectory Modeling
* Unified, Resilient, and Explainable Adversarial Patch Detector, A
* UniGoal: Towards Universal Zero-shot Goal-oriented Navigation
* UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping
* UniHOPE: A Unified Approach for Hand-Only and Hand-Object Pose Estimation
* UniK3D: Universal Camera Monocular 3D Estimation
* UniMamba: Unified Spatial-Channel Representation Learning with Group-Efficient Mamba for LiDAR-based 3D Object Detection
* UniNet: A Contrastive Learning-guided Unified Framework with Feature Selection for Anomaly Detection
* UniPhy: Learning a Unified Constitutive Model for Inverse Physics Simulation
* UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing
* UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting
* UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics
* UniRestore: Unified Perceptual and Task-Oriented Image Restoration Model Using Diffusion Prior
* UniScene: Unified Occupancy-centric Driving Scene Generation
* UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines
* Unity in Diversity: Video Editing via Gradient-Latent Purification
* UniVAD: A Training-free Unified Model for Few-shot Visual Anomaly Detection
* Universal Actions for Enhanced Embodied Foundation Models
* Universal Domain Adaptation for Semantic Segmentation
* universal physically-based topographic correction framework for high-resolution optical satellite data, A
* Universal Scale-Adaptive Deformable Transformer for Image Restoration across Diverse Artifacts, A
* Universal Scene Graph Generation
* Unlearning through Knowledge Overwriting: Reversible Federated Unlearning via Selective Sparse Adapter
* Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation
* Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation
* Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation
* Unlocking Generalization Power in LiDAR Point Cloud Registration
* Unlocking Potato Phenology: Harnessing Sentinel-1 and Sentinel-2 Synergy for Precise Crop Stage Detection
* Unlocking the Potential of Unlabeled Data in Semi-Supervised Domain Generalization
* Unmanned Aerial Vehicle-Based RGB Imaging and Lightweight Deep Learning for Downy Mildew Detection in Kimchi Cabbage
* UNOPose: Unseen Object Pose Estimation with an Unposed RGB-D Reference Image
* Unraveling Normal Anatomy via Fluid-Driven Anomaly Randomization
* Unraveling the Spatiotemporal Dynamics of Rubber Phenology in Hainan Island, China: A Multi-Sensor Remote Sensing and Climate Drivers Analysis
* Unseen Visual Anomaly Generation
* Unsupervised Continual Domain Shift Learning with Multi-Prototype Modeling
* Unsupervised Discovery of Facial Landmarks and Head Pose
* Unsupervised Foundation Model-Agnostic Slide-Level Representation Learning
* Unsupervised Representation Learning for Monitoring Rail Infrastructures With High-Frequency Moving Vibration Sensors
* Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing
* Unveiling Context-Related Anomalies: Knowledge Graph Empowered Decoupling of Scene and Action for Human-Related Video Anomaly Detection
* Unveiling Differences in Generative Models: A Scalable Differential Clustering Approach
* Unveiling Drivers and Projecting Future Risks of Desertification Vulnerability in the Mongolian Plateau
* Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering Incorrectly
* Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis
* Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach
* UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation
* Upscaling Frameworks Drive Prediction Accuracy and Uncertainty When Mapping Aboveground Biomass Density from the Synergism of Spaceborne LiDAR, SAR, and Passive Optical Data
* Urban Expansion and the Loss of Agricultural Lands and Forest Cover in Limbe, Cameroon
* Urban-Rural Differences in Cropland Loss and Fragmentation Caused by Construction Land Expansion in Developed Coastal Regions: Evidence from Jiangsu Province, China
* UrbanCAD: Towards Highly Controllable and Photorealistic 3D Vehicles for Urban Scene Simulation
* URWKV: Unified RWKV Model with Multi-state Perspective for Low-light Image Restoration
* Using Diffusion Priors for Video Amodal Segmentation
* Using Powerful Prior Knowledge of Diffusion Model in Deep Unfolding Networks for Image Compressive Sensing
* Using Zodiacal Light for Spaceborne Calibration of Polarimetric Imagers
* USP-Gaussian: Unifying Spike-based Image Reconstruction, Pose Correction and Gaussian Splatting
* UVGS: Reimagining Unstructured 3D Gaussian Splatting using UV Mapping
* UWAV: Uncertainty-Weighted Weakly-Supervised Audio-Visual Video Parsing
* v-CLR: View-Consistent Learning for Open-World Instance Segmentation
* V-Stylist: Video Stylization via Collaboration and Reflection of MLLM Agents
* V2 Dial: Unification of Video and Visual Dialog via Multimodal Experts
* V2V3D: View-to-View Denoised 3D Reconstruction for Light-Field Microscopy
* V2X-R: Cooperative LiDAR-4D Radar Fusion with Denoising Diffusion for 3D Object Detection
* VAAC-IM: Motion-Aware Viewing Area Adaptive Control in Immersive Media Transmission
* Validation of Global Moderate-Resolution FAPAR Products over Boreal Forests in North America Using Harmonized Landsat and Sentinel-2 Data
* Validation of Two Operative Google Earth Engine Applications to Generate 10 m Land Surface Temperature Maps at Daily to Weekly Temporal Resolutions
* Variable Fractional Network Evolutionary Game for Distributed Resilient Task Allocation in Heterogeneous Multi-Robot Systems
* Variance-Based Membership Inference Attacks Against Large-Scale Image Captioning Models
* Variational Positive-Incentive Noise: How Noise Benefits Models
* VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsification
* VasTSD: Learning 3D Vascular Tree-state Space Diffusion Model for Angiography Synthesis
* VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents
* Vegetation Baseline and Urbanization Development Level: Key Determinants of Long-Term Vegetation Greening in China's Rapidly Urbanizing Region
* VELOCITI: Benchmarking Video-Language Compositional Reasoning with Strict Entailment
* VERA: Explainable Video Anomaly Detection via Verbalized Learning of Vision-Language Models
* VerbDiff: Text-Only Diffusion Models with Enhanced Interaction Awareness
* Verifying the Effects of the Grey Level Co-Occurrence Matrix and Topographic-Hydrologic Features on Automatic Gully Extraction in Dexiang Town, Bayan County, China
* Vertex Correspondence and Self-Intersection Reduction in Cortical Surface Reconstruction
* vesselFM: A Foundation Model for Universal 3D Blood Vessel Segmentation
* VEU-Bench: Towards Comprehensive Understanding of Video Editing
* VGGT: Visual Geometry Grounded Transformer
* VI3NR: Variance Informed Initialization for Implicit Neural Representations
* ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation
* Vid2Avatar-Pro: Authentic Avatar from Videos in the Wild via Universal Prior
* Vid2Sim: Generalizable, Video-based Reconstruction of Appearance, Geometry and Physics for Mesh-free Simulation
* Vid2Sim: Realistic and Interactive Simulation from Video for Urban Navigation
* VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic Manipulation
* VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
* Video Depth Anything: Consistent Depth Estimation for Super-Long Videos
* Video Depth without Video Models
* Video Language Model Pretraining with Spatio-temporal Masking
* Video Motion Transfer with Diffusion Transformers
* Video Summarization with Large Language Models
* Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding
* Video-Bench: Human-Aligned Video Generation Benchmark
* Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval
* Video-Guided Foley Sound Generation with Multimodal Controls
* Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
* Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models
* Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding
* VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation
* VideoComp: Advancing Fine-Grained Compositional and Temporal Alignment in Video-Text Models
* VideoDirector: Precise Video Editing via Text-to-Video Models
* VideoDPO: Omni-Preference Alignment for Video Diffusion Generation
* VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
* VideoGEM: Training-Free Action Grounding in Videos
* VideoGigaGAN: Towards Detail-rich Video Super-Resolution
* VideoGLaMM : A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
* VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide
* VideoHandles: Editing 3D Object Compositions in Videos Using Video Generative Priors
* VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding
* VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models
* VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
* VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step
* VideoSPatS: Video SPatiotemporal Splines for Disentangled Occlusion, Appearance and Motion Modeling and Editing
* VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
* VideoWorld: Exploring Knowledge Learning from Unlabeled Videos
* VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding
* VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
* VidSeg: Training-free Video Semantic Segmentation based on Diffusion Models
* VidTwin: Video VAE with Decoupled Structure and Dynamics
* Viewpoint Rosetta Stone: Unlocking Unpaired Ego-Exo Videos for View-invariant Representation Learning
* ViiNeuS: Volumetric Initialization for Implicit Neural Surface reconstruction of urban scenes with limited image overlap
* ViKIENet: Towards Efficient 3D Object Detection with Virtual Key Instance Enhanced Network
* VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge
* VinaBench: Benchmark for Faithful and Consistent Visual Narratives
* VinTAGe: Joint Video and Text Conditioning for Holistic Audio Generation
* VIRES: Video Instance Repainting via Sketch and Text Guided Generation
* VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning
* Visibility-Aware Multi-View Stereo by Surface Normal Weighting for Occlusion Robustness
* Vision-Guided Action: Enhancing 3D Human Motion Prediction with Gaze-informed Affordance in 3D Scenes
* Vision-Language Embodiment for Monocular Depth Estimation
* Vision-Language Gradient Descent-driven All-in-One Deep Unfolding Networks
* Vision-Language Model IP Protection via Prompt-based Learning
* Vision-Language Models Do Not Understand Negation
* VisionArena: 230K Real World User-VLM Conversations with Preference Labels
* VisionPAD: A Vision-Centric Pre-training Paradigm for Autonomous Driving
* VisionZip: Longer is Better but Not Necessary in Vision Language Models
* VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging
* VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation
* ViStream: Improving Computation Efficiency of Visual Streaming Perception via Law-of-Charge-Conservation Inspired Spiking Neural Network
* Visual Agentic AI for Spatial Reasoning with a Dynamic API
* Visual and Semantic Prompt Collaboration for Generalized Zero-Shot Learning
* Visual Consensus Prompting for Co-Salient Object Detection
* Visual Lexicon: Rich Image Features in Language Space
* Visual Persona: Foundation Model for Full-Body Human Customization
* Visual Prompting for One-shot Controllable Video Editing without Inversion
* Visual Representation Learning through Causal Intervention for Controllable Image Editing
* Visual-Instructed Degradation Diffusion for All-in-One Image Restoration
* VITED: Video Temporal Evidence Distillation
* ViUniT: Visual Unit Tests for More Robust Visual Programming
* VJDNet: A Simple Variational Joint Discrimination Network for Cross-Image Hyperspectral Anomaly Detection
* VL-RewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
* VL2Lite: Task-Specific Knowledge Distillation from Large Vision-Language Models to Lightweight Networks
* VladVA: Discriminative Fine-tuning of LVLMs
* VLMs-Guided Representation Distillation for Efficient Vision-Based Reinforcement Learning
* VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary
* VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis
* VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models
* VMMT-Net: A Dual-Branch Parallel Network Combining Visual State Space Model and Mix Transformer for Land-Sea Segmentation of Remote Sensing Images
* VoCo-LLaMA: Towards Vision Compression with Large Language Models
* VODiff: Controlling Object Visibility Order in Text-to-Image Generation
* VolFormer: Explore More Comprehensive Cube Interaction for Hyperspectral Image Restoration and Beyond
* Volume Tells: Dual Cycle-Consistent Diffusion for 3D Fluorescence Microscopy De-noising and Super-Resolution
* Volumetric Surfaces: Representing Fuzzy Geometries with Layered Meshes
* Volumetrically Consistent 3D Gaussian Rasterization
* VoteFlow: Enforcing Local Rigidity in Self-Supervised Scene Flow
* VoxelSplat: Dynamic Gaussian Splatting as an Effective Loss for Occupancy and Flow Prediction
* VSNet: Focusing on the Linguistic Characteristics of Sign Language
* VTON 360: High-Fidelity Virtual Try-On from Any Viewing Direction
* VTON-HandFit: Virtual Try-on for Arbitrary Hand Pose Guided by Hand Priors Embedding
* Watermarking One for All: A Robust Watermarking Scheme Against Partial Image Theft
* Wav2Sem: Plug-and-Play Audio Semantic Decoupling for 3D Speech-Driven Facial Animation
* Wav2Sem: Plug-and-Play Audio Semantic Decoupling for 3D Speech-Driven Facial Animation
* Wav2Sem: Plug-and-Play Audio Semantic Decoupling for 3D Speech-Driven Facial Animation
* WAVE: Weight Templates for Adaptive Initialization of Variable-sized Models
* WaveFusion: A Novel Wavelet Vision Transformer With Saliency-Guided Enhancement for Multimodal Image Fusion
* Wavelet and Prototype Augmented Query-based Transformer for Pixel-level Surface Defect Detection
* Weakly Supervised Contrastive Adversarial Training for Learning Robust Features from Semi-supervised Data
* Weakly Supervised Network for Coarse-to-Fine Change Detection in Hyperspectral Images, A
* Weakly Supervised Semantic Segmentation via Progressive Confidence Region Expansion
* Weakly Supervised Temporal Action Localization via Dual-Prior Collaborative Learning Guided by Multimodal Large Language Models
* WeakMCN: Multi-task Collaborative Network for Weakly Supervised Referring Expression Comprehension and Segmentation
* Wearable Galvanic Vestibular Stimulation Device With Adaptive Motion Sickness Risk Prediction and Symptoms Alleviation, A
* WeatherGen: A Unified Diverse Weather Generator for LiDAR Point Clouds via Spider Mamba Diffusion
* WeGen: A Unified Model for Interactive Multimodal Generation as We Chat
* weight-adaptive updated method for grasshopper habitat mapping at the national scale using remote sensing: Combined with spatial heterogeneity and landscape, A
* WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
* What Makes a Good Dataset for Knowledge Distillationƒ
* What's in the Image? A Deep-Dive into the Vision of Vision Language Models
* When Domain Generalization meets Generalized Category Discovery: An Adaptive Task-Arithmetic Driven Approach
* When the Future Becomes the Past: Taming Temporal Correspondence for Self-supervised Video Representation Learning
* Where the Devil Hides: Deepfake Detectors Can No Longer Be Trusted
* Where's the liability in the Generative Era? Recovery-based Black-Box Detection of AI-Generated Content
* Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Instructional Videos
* WHU-RS19 ABZSL: An Attribute-Based Dataset for Remote Sensing Image Understanding
* Widely Linear Complex-Valued Affine Projection Algorithm With a Sliding-Window Step-Size
* WildAvatar: Learning In-the-wild 3D Avatars from the Web
* WildGS-SLAM: Monocular Gaussian Splatting SLAM in Dynamic Environments
* WiLoR: End-to-end 3D Hand Localization and Reconstruction in-the-wild
* WISE: A Framework for Gigapixel Whole-Slide-Image Lossless Compression
* WISH: Weakly Supervised Instance Segmentation using Heterogeneous Labels
* WISNet: Pseudo Label Generation on Unbalanced and Patch Annotated Waste Images
* Wonderland: Navigating 3D Scenes From a Single Image
* WonderWorld: Interactive 3D Scene Generation from a Single Image
* Words or Vision: Do Vision-Language Models Have Blind Faith in Text?
* World-consistent Video Diffusion with Explicit 3D Modeling
* X-Dyna: Expressive Dynamic Human Image Animation
* XCH4 Spatiotemporal Variations in a Natural-Gas-Exploiting Basin with Intensive Agriculture Activities Using Multiple Remote Sensing Datasets: Case from Sichuan Basin, China
* XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery?
* Yo'Chameleon: Personalized Vision and Language Generation
* You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale
* Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding
* Your Scale Factors are My Weapon: Targeted Bit-Flip Attacks on Vision Transformers via Scale Factor Manipulation
* Your ViT is Secretly an Image Segmentation Model
* Z-Magic: Zero-shot Multiple Attributes Guided Image Creator
* Z-Splat: Z-Axis Gaussian Splatting for Camera-Sonar Fusion
* Zero-1-to-A: Zero-Shot One Image to Animatable Head Avatars Using Video Diffusion
* Zero-shot 3D Question Answering via Voxel-based Dynamic Token Compression
* Zero-Shot 4D Lidar Panoptic Segmentation
* Zero-Shot Blind-spot Image Denoising via Implicit Neural Sampling
* Zero-Shot Head Swapping in Real-World Scenarios
* Zero-Shot Image Restoration Using Few-Step Guidance of Consistency Models (and Beyond)
* Zero-Shot Monocular Scene Flow Estimation in the Wild
* Zero-Shot Novel View and Depth Synthesis with Multi-View Geometric Diffusion
* Zero-Shot RGB-D Point Cloud Registration with Pre-Trained Large Vision Model
* Zero-Shot Skeleton-Based Action Recognition With Prototype-Guided Feature Alignment
* Zero-Shot Styled Text Image Generation, but Make It Autoregressive
* ZeroGrasp: Zero-Shot Shape Reconstruction Enabled Robotic Grasping
* ZeroVO: Visual Odometry with Minimal Assumptions
* Zig-RiR: Zigzag RWKV-in-RWKV for Efficient Medical Image Segmentation
* ZoomLDM: Latent Diffusion Model for multi-scale image generation
3725 for 2508

Last update:13-Jun-26 21:24:50
Use price@usc.edu for comments.