Update Dates 2410

2410 * *Adversarial Machine Learning on Computer Vision: Robustness of Foundation Models
* *Affective Behavior Analysis In-the-Wild
* *AgriVision: Agriculture-Vision: Challenges and Opportunities for Computer Vision in Agriculture
* *AI City Challenge
* *AI for Space
* *AIS: Vision, Graphics and AI for Streaming
* *Autonomous Driving
* *Biometrics
* *Bridging the Gap Between Computational Photography and Visual Recognition
* *ChaLearn Face Anti-Spoofing
* *Challenge on Mobile Intelligent Photography and Imaging
* *Computational Color Imaging
* *Computer Vision for Microscopy Image Analysis
* *Computer Vision for Mixed Reality
* *Computer Vision for Physiological Measurement
* *Computer Vision in Sports
* *Continual Learning in Computer Vision
* *CVPR
* *CVPR
* *Data Curation and Augmentation in Enhancing Medical Imaging Applications
* *Dataset Distillation for Computer Vision
* *Deep Learning for Geometric Computing
* *DeepFake Analysis and Detection
* *Domain adaptation, Explainability and Fairness in AI for Medical Image Analysis
* *EarthVision: Large Scale Computer Vision for Remote Sensing Imagery
* *Efficient and On-Device Generation
* *Efficient Deep Learning for Computer Vision
* *Efficient Large Vision Models
* *Embedded Computer Vision
* *Evaluation of Generative Foundation Models
* *Fair, Data-Efficient and Trusted Computer Vision
* *Federated Learning for Computer Vision
* *Fine-grained Visual Categorization
* *Gaze Estimation and Prediction in the Wild
* *Generative Models for Computer Vision
* *Graphic Design Understanding and Generation
* *Human Motion Generation
* *Image Matching: Local Features and Beyond
* *Implicit Neural Representation for Vision
* *Learning 3D with Multi-View Supervision
* *Learning With Limited Labelled Data for Image and Video Understanding
* *Media Forensics
* *MetaFood Workshop
* *Mobile AI
* *Multimodal Algorithmic Reasoning Workshop
* *Multimodal Content Moderation
* *Multimodal Learning and Applications
* *Neural Architecture Search: Lightweight NAS Challenge (NAS)
* *Neural Rendering Intelligence
* *New frontiers for zero-shot Image Captioning Evaluation
* *New Trends in Image Restoration and Enhancement
* *Omnidirectional Computer Vision in Research and Industry
* *Open-Vocabulary 3D Scene Understanding
* *Perception Beyond the Visible Spectrum
* *Physics Based Vision Meets Deep Learning
* *Pixel-Level Video Understanding in the Wild Challenge
* *Precognition: Seeing Through the Future
* *Prompting in Vision
* *Representation Learning with Very Limited Images: Zero-shot, Unsupervised, and Synthetic Learning in the Era of Big Models
* *Rhobin Challenge: Reconstruction of Human-Object Interaction
* *Robot Visual Perception in Human Crowded Environments
* *Safe Artificial Intelligence for All Domains
* *SyntaGen: Harnessing Generative Models for Synthetic Visual Datasets
* *Test-Time Adaptation: Model, Adapt Thyself!
* *Urban Scene Modeling: Where Vision Meets Photogrammetry and Graphics
* *Vision Datasets Understanding
* *Visual Anomaly and Novelty Detection
* *VOCVALC: Visual Odometry and Computer Vision Applications Based on Location Clues - With a Focus on Mobile Platform Applications
* *What is Next in Multimodal Foundation Models?
* *Women in Computer Vision
* *Women in Computer Vision
* *Workshop on Foundation Models: Foundation Model Challenge
* *Workshop on Scene Graphs and Graph Representation Learning
* *Workshop on Vision-Based Industrial Inspection
* 1-Lipschitz Layers Compared: Memory, Speed, and Certifiable Robustness
* 2S-UDF: A Novel Two-Stage UDF Learning Method for Robust Non-Watertight Model Reconstruction from Multi-View Images
* 2T-UNET: A Two-Tower UNet with Depth Clues for Robust Stereo Depth Estimation
* 360+x: A Panoptic Multi-modal Scene Understanding Dataset
* 360DVD: Controllable Panorama Video Generation with 360-Degree Video Diffusion Model
* 360Loc: A Dataset and Benchmark for Omnidirectional Visual Localization with Cross-Device Queries
* 3D Building Reconstruction from Monocular Remote Sensing Images with Multi-level Supervisions
* 3D Clothed Human Reconstruction from Sparse Multi-View Images
* 3D Density Structure of the South China Sea Based on Wavelet Multi-Scale Analysis of Gravity Data and Its Tectonic Implications, The
* 3D Face Reconstruction with the Geometric Guidance of Facial Part Segmentation
* 3D Face Tracking from 2D Video through Iterative Dense UV to Image Flow
* 3D Facial Expressions through Analysis-by-Neural-Synthesis
* 3D Feature Tracking via Event Camera
* 3D Geometry-aware Deformable Gaussian Splatting for Dynamic View Synthesis
* 3D Human Pose Estimation with Occlusions: Introducing BlendMimic3D Dataset and GCN Refinement
* 3D Human Pose Perception from Egocentric Stereo Videos
* 3D Human Scan With A Moving Event Camera
* 3D Kinematics Estimation from Video with a Biomechanical Model and Synthetic Training Data
* 3D LiDAR Mapping in Dynamic Environments Using a 4D Implicit Neural Representation
* 3D Multi-frame Fusion for Video Stabilization
* 3D Neural Edge Reconstruction
* 3D Paintbrush: Local Stylization of 3D Shapes with Cascaded Score Distillation
* 3D-Aware Face Editing via Warping-Guided Latent Direction Learning
* 3D-LFM: Lifting Foundation Model
* 3D-MuPPET: 3D Multi-Pigeon Pose Estimation and Tracking
* 3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation
* 3DFIRES: Few Image 3D REconstruction for Scenes with Hidden Surfaces
* 3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting
* 3DGStream: On-the-Fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos
* 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features
* 3DInAction: Understanding Human Actions in 3D Point Clouds
* 3DSFLabelling: Boosting 3D Scene Flow Estimation by Pseudo Auto-Labelling
* 3DToonify: Creating Your High-Fidelity 3D Stylized Avatar Easily from 2D Portrait Images
* 40-Year Time Series of Land Surface Emissivity Derived from AVHRR Sensors: A Fennoscandian Perspective, A
* 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering
* 4D-DRESS: A 4D Dataset of Real-World Human Clothing with Semantic Annotations
* 4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling
* 4K4D: Real-Time 4D View Synthesis at 4K Resolution
* 6D-Diff: A Keypoint Diffusion Framework for 6D Object Pose Estimation
* 6th Affective Behavior Analysis in-the-wild (ABAW) Competition, The
* 8th AI City Challenge, The
* A&B BNN: Add&Bit-Operation-Only Hardware-Friendly Binary Neural Network
* A-Teacher: Asymmetric Network for 3D Semi-Supervised Object Detection
* A2XP: Towards Private Domain Generalization
* AAMDM: Accelerated Auto-Regressive Motion Diffusion Model
* AAPL: Adding Attributes to Prompt Learning for Vision-Language Models
* ABC-CapsNet: Attention based Cascaded Capsule Network for Audio Deepfake Detection
* Abductive Ego-View Accident Video Understanding for Safe Driving Perception
* Aberration Modulation Correlation Method for Dim and Small Space Target Detection
* Absolute Pose from One or Two Scaled and Oriented Features
* Accelerating Diffusion Sampling with Optimized Time Steps
* Accelerating Neural Field Training via Soft Mining
* Accept the Modality Gap: An Exploration in the Hyperbolic Space
* Accurate and Robust Object Detection via Selective Adversarial Learning With Constraints
* Accurate Spatial Gene Expression Prediction by Integrating Multi-Resolution Features
* Accurate Training Data for Occupancy Map Prediction in Automated Driving Using Evidence Theory
* ACDF-YOLO: Attentive and Cross-Differential Fusion Network for Multimodal Remote Sensing Object Detection
* Achieving Reliable and Fair Skin Lesion Diagnosis via Unsupervised Domain Adaptation
* Achieving the Optimum Rate for Cross-Modal Source Coding
* ACT-Diffusion: Efficient Adversarial Consistency Training for One-Step Diffusion Models
* Action Detection via an Image Diffusion Process
* Action Scene Graphs for Long-Form Understanding of Egocentric Videos
* Action-Semantic Consistent Knowledge for Weakly-Supervised Action Localization
* Action-Slot: Visual Action-Centric Representations for Multi-Label Atomic Activity Recognition in Traffic Scenes
* Active Data Collection and Management for Real-World Continual Learning via Pretrained Oracle
* Active Domain Adaptation with False Negative Prediction for Object Detection
* Active Generalized Category Discovery
* Active Object Detection with Knowledge Aggregation and Distillation from Large Models
* Active Open-Vocabulary Recognition: Let Intelligent Moving Mitigate CLIP Limitations
* Active Prompt Learning in Vision Language Models
* Active Transferability Estimation
* ActiveDC: Distribution Calibration for Active Finetuning
* Activity-Biometrics: Person Identification from Daily Activities
* ADA-Track: End-to-End Multi-Camera 3D Multi-Object Tracking with Alternating Detection and Association
* AdaBM: On-the-Fly Adaptive Bit Mapping for Image Super-Resolution
* Adapt Before Comparison: A New Perspective on Cross-Domain Few-Shot Segmentation
* Adapt or Perish: Adaptive Sparse Transformer with Attentive Feature Refinement for Image Restoration
* Adapters Strike Back
* Adapting Few-Shot Classification via In-Process Defense
* Adapting Short-Term Transformers for Action Detection in Untrimmed Videos
* Adapting the Segment Anything Model During Usage in Novel Situations
* Adapting to Length Shift: FlexiLength Network for Trajectory Prediction
* Adapting Vision-Language Models via Learning to Inject Knowledge
* Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images
* Adaptive and Interactive Multi-Level Spatio-Temporal Network for Traffic Forecasting
* Adaptive Bidirectional Displacement for Semi-Supervised Medical Image Segmentation
* Adaptive Correlation Filtering Method for Text-Based Person Search, An
* Adaptive Discriminative Regularization for Visual Classification
* Adaptive Dual-Channel Event-Triggered Fuzzy Control for Autonomous Underwater Vehicles With Multiple Obstacles Environment
* Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving
* Adaptive Hyper-graph Aggregation for Modality-Agnostic Federated Learning
* Adaptive Log-Euclidean Metrics for SPD Matrix Learning
* Adaptive LPU Decision for Dynamic Point Cloud Compression
* Adaptive Memory Replay for Continual Learning
* Adaptive Multi-Modal Cross-Entropy Loss for Stereo Matching
* Adaptive Random Feature Regularization on Fine-tuning Deep Neural Networks
* Adaptive Render-Video Streaming for Virtual Environments
* Adaptive Slot Attention: Object Discovery with Dynamic Slot Number
* Adaptive Softassign via Hadamard-Equipped Sinkhorn
* Adaptive Speed Optimization Strategy at Signalized Intersections Based on the Penetration Rate of Connected Automated Vehicles
* Adaptive VIO: Deep Visual-Inertial Odometry with Online Continual Learning
* Adaptive Weight Generator for Multi-Task Image Recognition by Task Grouping Prompt
* AdaRevD: Adaptive Patch Exiting Reversible Decoder Pushes the Limit of Image Deblurring
* AdaShift: Learning Discriminative Self-Gated Neural Feature Activation With an Adaptive Shift Factor
* Addressing Background Context Bias in Few-Shot Segmentation Through Iterative Modulation
* Addressing Challenges of Incorporating Appearance Cues Into Heuristic Multi-Object Tracker via a Novel Feature Paradigm
* ADFactory: An Effective Framework for Generalizing Optical Flow With NeRF
* Advanced Facial Analysis in Multi-Modal Data with Cascaded Cross-Attention based Transformer
* Advancing Brain Tumor Analysis: Curating a High-Quality MRI Dataset for Deep Learning-Based Molecular Marker Profiling
* Advancing COVID-19 Detection in 3D CT Scans
* Advancing Cross-Domain Generalizability in Face Anti-Spoofing: Insights, Design, and Metrics
* Advancing CubeSats Capabilities: Ground-Based Calibration of Uvsq-Sat NG Satellite's NIR Spectrometer and Determination of the Extraterrestrial Solar Spectrum
* Advancing Saliency Ranking with Human Fixations: Dataset, Models and Benchmarks
* AdvDenoise: Fast Generation Framework of Universal and Robust Adversarial Patches Using Denoise
* Adversarial Backdoor Attack by Naturalistic Data Poisoning on Trajectory Prediction in Autonomous Driving
* Adversarial Distillation Based on Slack Matching and Attribution Region Alignment
* Adversarial Identity Injection for Semantic Face Image Synthesis
* Adversarial Reweighting with-Power Maximization for Domain Adaptation
* Adversarial Score Distillation: When Score Distillation Meets GAN
* Adversarial self-training for robustness and generalization
* Adversarial Text to Continuous Image Generation
* Adversarially Robust Few-shot Learning via Parameter Co-distillation of Similarity and Class Concept Learners
* Aerial Lifting: Neural Urban Semantic and Building Instance Lifting from Aerial Imagery
* AEROBLADE: Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error
* AETTA: Label-Free Accuracy Estimation for Test-Time Adaptation
* Affine Equivariant Networks Based on Differential Invariants
* Affine-based Deformable Attention and Selective Fusion for Semi-dense Matching
* AffordanceLLM: Grounding Affordance from Vision Language Models
* AFMSFFNet: An Anchor-Free-Based Feature Fusion Model for Ship Detection
* AgeDETR: Attention-Guided Efficient DETR for Space Target Detection
* Aggregation-Free Federated Learning for Tackling Data Heterogeneity, An
* AgileGAN3D: Few-Shot 3D Portrait Stylization by Augmented Transfer Learning
* AHIVE: Anatomy-Aware Hierarchical Vision Encoding for Interactive Radiology Report Retrieval
* AHOR: Online Multi-Object Tracking With Authenticity Hierarchizing and Occlusion Recovery
* AI Art Neural Constellation: Revealing the Collective and Contrastive State of AI-Generated and Human Art
* AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving
* AIGC Image Quality Assessment via Image-Prompt Correspondence
* AIGC-VQA: A Holistic Perception Metric for AIGC Video Quality Assessment
* AIGeN: An Adversarial Approach for Instruction Generation in VLN
* AIGIQA-20K: A Large Database for AI-Generated Image Quality Assessment
* AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation
* Aircraft Navigation in GNSS-Denied Environments via Radio SLAM With Terrestrial Signals of Opportunity
* AirPlanes: Accurate Plane Estimation via 3D-Consistent Embeddings
* AIS 2024 Challenge on Video Quality Assessment of User-Generated Content: Methods and Results
* AKDC: Ambiguous Kernel Distance Clustering Algorithm for COVID-19 CT Scans Analysis
* Alchemist: Parametric Control of Material Properties with Diffusion Models
* ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision Transformers
* Algorithm for Designing Waveforms Similar to Linear Frequency Modulation Using Polyphase-Coded Frequency Modulation
* Align and Aggregate: Compositional Reasoning with Video Alignment and Answer Aggregation for Video Question-Answering
* Align and Retrieve: Composition and Decomposition Learning in Image Retrieval With Text Feedback
* Align Before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action Recognition
* Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models
* Aligning and Prompting Everything All at Once for Universal Visual Perception
* Aligning Logits Generatively for Principled Black-Box Knowledge Distillation
* AlignMiF: Geometry-Aligned Multimodal Implicit Field for LiDAR-Camera Joint Synthesis
* AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning
* ALINA: Advanced Line Identification and Notation Algorithm
* All in One Framework for Multimodal Re-Identification in the Wild
* All Rivers Run to the Sea: Private Learning with Asymmetric Flows
* Alleviating Class Imbalance in Semi-Supervised Multi-Organ Segmentation via Balanced Subclass Regularization
* AllSpark: Reborn Labeled Features from Unlabeled in Transformer for Semi-Supervised Semantic Segmentation
* Alpha Invariance: On Inverse Scaling Between Distance and Volume Density in Neural Radiance Fields
* Alpha-CLIP: A CLIP Model Focusing on Wherever you Want
* AM-RADIO: Agglomerative Vision Foundation Model Reduce All Domains Into One
* AMHFN: Aggregation Multi-Hierarchical Feature Network for Hyperspectral Image Classification
* Amodal Completion via Progressive Mixed Context Diffusion
* Amodal Ground Truth and Completion in the Wild
* AMU-Tuning: Effective Logit Bias for CLIP-based Few-shot Learning
* Analysis and Benchmarking of Extending Blind Face Image Restoration to Videos
* analysis of best-practice strategies for replay and rehearsal in continual learning, An
* Analysis of Spatiotemporal Changes in Energy Consumption Carbon Emissions at District and County Levels Based on Nighttime Light Data: A Case Study of Jiangsu Province in China
* Analysis of the Grid Quantization for the Microwave Radar Coincidence Imaging Based on Basic Correlation Algorithm
* Analysis of Video Quality Datasets via Design of Minimalistic Video Quality Models
* Analyzing and Improving the Training Dynamics of Diffusion Models
* Analyzing Participants' Engagement during Online Meetings Using Unsupervised Remote Photoplethysmography with Behavioral Features
* Analyzing Temporal Characteristics of Winter Catch Crops Using Sentinel-1 Time Series
* Analyzing the Internals of Neural Radiance Fields
* Anatomically Constrained Implicit Face Models
* Anchor-based Robust Finetuning of Vision-Language Models
* ANIM: Accurate Neural Implicit Model for Human Reconstruction from a Single RGB-D Image
* AnimalFormer: Multimodal Vision Framework for Behavior-based Precision Livestock Farming
* Animatable Gaussians: Learning Pose-Dependent Gaussian Maps for High-Fidelity Human Avatar Modeling
* Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
* Animating General Image with Large Visual Motion Model
* ANN-Based Filtering of Drone LiDAR in Coastal Salt Marshes Using Spatial-Spectral Features
* Annotated Dataset for Training Cloud Segmentation Neural Networks Using High-Resolution Satellite Remote Sensing Imagery
* Anomaly Heterogeneity Learning for Open-Set Supervised Anomaly Detection
* Anomaly Score: Evaluating Generative Models and Individual Generated Images Based on Complexity and Vulnerability
* AnonMAKE: Toward Secure and Anonymous Mutually Authenticated Key Exchange Protocol for Vehicular Communications
* Any-Shift Prompting for Generalization Over Distributions
* AnyDoor: Zero-shot Object-level Image Customization
* AnyScene: Customized Image Synthesis with Composited Foreground
* AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents
* AodeMar: Attention-Aware Occlusion Detection of Vessels for Maritime Autonomous Surface Ships
* Aperiodic Coordination Scheduling of Multiple PPLs in Shipboard Integrated Power Systems
* APISR: Anime Production Inspired Real-World Anime Super-Resolution
* Applicability of Relatively Low-Cost Multispectral Uncrewed Aerial Systems for Surface Characterization of the Cryosphere
* Application of Direct and Indirect Methodologies for Beach Litter Detection in Coastal Environments
* Application of Remote Sensing and Explainable Artificial Intelligence (XAI) for Wildfire Occurrence Mapping in the Mountainous Region of Southwest China
* APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentation
* AR-CP: Uncertainty-Aware Perception in Adverse Conditions with Conformal Prediction and Augmented Reality For Assisted Driving
* Arbitrary Motion Style Transfer with Multi-Condition Motion Latent Diffusion Model
* Arbitrary-Scale Image Generation and Upsampling Using Latent Diffusion Model and Implicit Neural Decoder
* Are Conventional SNNs Really Efficient? A Perspective from Network Quantization
* Are Deep Learning Models Pre-trained on RGB Data Good Enough for RGB-Thermal Image Retrieval?
* Are NeRFs ready for autonomous driving? Towards closing the real-to-simulation gap
* ArGue: Attribute-Guided Prompt Tuning for Vision-Language Models
* ARMedicalSketch: Exploring 3D Sketching for Medical Image Using True 2D-3D Interlinked Visualization and Interaction
* ArtAdapter: Text-to-Image Style Transfer using Multi-Level Style Encoder and Explicit Adaptation
* Arterial Signal Timing Based on Probe Vehicle Trajectories Under Cyclic Stochastic Demand
* Artificial Intelligence Failures in Autonomous Vehicles: Causes, Implications, and Prevention
* Artist-Friendly Relightable and Animatable Neural Heads
* ARTrackV2: Prompting Autoregressive Tracker Where to Look and How to Describe
* ART•V: Auto-Regressive Text-to-Video Generation with Diffusion Models
* As-Plausible-As-Possible: Plausibility-Aware Mesh Deformation Using 2D Diffusion Priors
* ASAM: Boosting Segment Anything Model with Adversarial Tuning
* ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering
* Assessing Air Quality Dynamics during Short-Period Social Upheaval Events in Quito, Ecuador, Using a Remote Sensing Framework
* Assessing Evapotranspiration Changes in Response to Cropland Expansion in Tropical Climates
* Assessing the Impact of Agricultural Practices and Urban Expansion on Drought Dynamics Using a Multi-Drought Index Application Implemented in Google Earth Engine: A Case Study of the Oum Er-Rbia Watershed, Morocco
* Assessing the Performance of Efficient Face Anti-Spoofing Detection Against Physical and Digital Presentation Attacks
* Assessing the Potential of UAV for Large-Scale Fractional Vegetation Cover Mapping with Satellite Data and Machine Learning
* Assessing the Response of the Net Primary Productivity to Snow Phenology Changes in the Tibetan Plateau: Trends and Environmental Drivers
* Assessment of Habitat Quality in Arid Regions Incorporating Remote Sensing Data and Field Experiments
* Assessment of Hongtu-1 Multi-Static X-Band SAR Constellation Interferometry
* Assessment of Multiple Planetary Boundary Layer Height Retrieval Methods and Their Impact on PM2.5 and Its Chemical Compositions throughout a Year in Nanjing
* Assessment of Spatial Characterization Metrics for On-Orbit Performance of Landsat 8 and 9 Thermal Infrared Sensors
* Assessment of Systematic Errors in Mapping Electricity Access Using Night-Time Lights: A Case Study of Rwanda and Kenya
* AssistGUI: Task-Oriented PC Graphical User Interface Automation
* Association between Autism Spectrum Disorder and Environmental Quality in the United States
* Association between Built Environment and Bus Usage among Older Adults: Urban-Rural Differences in the Nonlinearities
* AsymFormer: Asymmetrical Cross-Modal Representation Learning for Mobile Platform Real-Time RGB-D Semantic Segmentation
* Asymmetric Augmented Self-Supervised Learning Method for Unsupervised Fine-Grained Image Hashing, An
* Asymmetric Convolution: An Efficient and Generalized Method to Fuse Feature Maps in Multiple Vision Tasks
* Asymmetric Masked Distillation for Pre-Training Small Foundation Models
* Asymmetric Response of the Indonesian Throughflow to Co-Occurring El Niño-Southern Oscillation-Indian Ocean Dipole Events
* Asymptotic Feature Pyramid Network for Labeling Pixels and Regions
* Asynchronous Shuffled Frog-Leaping With Feasible Jaya Algorithm for Uncertain Task Rescheduling Problem in UAV Emergency Networks, An
* Atlantis: Enabling Underwater Depth Estimation with Stable Diffusion
* Atom-Level Optical Chemical Structure Recognition with Limited Supervision
* ATOM: Attention Mixer for Efficient Dataset Distillation
* Atomic Color: From Points to Probability Distributions
* Attack To Defend: Exploiting Adversarial Attacks for Detecting Poisoned Models
* Attention Calibration for Disentangled Text-to-Image Personalization
* Attention Guidance Distillation Network for Efficient Image Super-Resolution
* Attention-Based Layer Fusion and Token Masking for Weakly Supervised Semantic Segmentation
* Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models
* Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting
* Attentive Illumination Decomposition Model for Multi-Illuminant White Balancing
* Attribute-Guided Pedestrian Retrieval: Bridging Person Re-ID with Internal Attribute Variability
* AttriHuman-3D: Editable 3D Human Avatar Generation with Attribute Decomposition and Indexing
* AUD-TGN: Advancing Action Unit Detection with Temporal Convolution and GPT-2 in Wild Audiovisual Contexts
* Audio Provenance Analysis in Heterogeneous Media Sets
* Audio Transformer for Synthetic Speech Detection via Multi-Formant Analysis
* Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective, The
* Audio-Visual Generalized Zero-Shot Learning using Pre-Trained Large Multi-Modal Models
* Audio-Visual Segmentation via Unlabeled Frame Exploitation
* Audio-Visual Speech Representation Expert for Enhanced Talking Face Video Generation and Evaluation
* AUEditNet: Dual-Branch Facial Action Unit Intensity Manipulation with Implicit Disentanglement
* AugData Distillation for Monocular 3D Human Pose Estimation
* Augmented Self-Mask Attention Transformer for Naturalistic Driving Action Recognition
* Augmenting Pass Prediction via Imitation Learning in Soccer Simulations
* Authentic Hand Avatar from a Phone Scan via Universal Hand Model
* Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft
* Auto- Train-Once: Controller Network Guided Automatic Network Pruning from Scratch
* AutoAD III: The Prequel: Back to the Pixels
* Automated Audio Data Augmentation Network Using Bi-Level Optimization for Sound Event Localization and Detection
* Automated Recognition of Snow-Covered and Icy Road Surfaces Based on T-Net of Mount Tianshan
* Automatic Controllable Colorization via Imagination
* Automatic Correction of Time-Varying Orbit Errors for Single-Baseline Single-Polarization InSAR Data Based on Block Adjustment Model
* Automatic Fine Co-Registration of Datasets from Extremely High Resolution Satellite Multispectral Scanners by Means of Injection of Residues of Multivariate Regression
* Automatic Landslide Detection in Gansu, China, Based on InSAR Phase Gradient Stacking and AttU-Net
* Automatic Radar Intra-Pulse Signal Modulation Classification Using the Supervised Contrastive Learning
* Automatic Recognition of Food Ingestion Environment from the AIM-2 Wearable Sensor
* Automatic Water Body Extraction from SAR Images Based on MADF-Net
* Autonomous Single-Image Dehazing: Enhancing Local Texture with Haze Density-Aware Image Blending
* Autoregressive Queries for Adaptive Tracking with Spatio-Temporal Transformers
* AutoSoccerPose: Automated 3D posture Analysis of Soccer Shot Movements
* AV-RIR: Audio-Visual Room Impulse Response Estimation
* AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
* AvatarGPT: All-in-One Framework for Motion Understanding, Planning, Generation and Beyond
* AVFF: Audio-Visual Feature Fusion for Video Deepfake Detection
* AVID: Any-Length Video Inpainting with Diffusion Model
* Axis-Based Transformer UNet for RGB Remote Sensing Image Denoising
* AZ-NAS: Assembling Zero-Cost Proxies for Network Architecture Search
* BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model
* BAA-NGP: Bundle-Adjusting Accelerated Neural Graphics Primitives
* Back to 3D: Few-Shot 3D Keypoint Detection with Back-Projected 2D Features
* Backdoor Defense via Test-Time Detecting and Repairing
* Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives, A
* Backpropagation-free Network for 3D Test-time Adaptation
* BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning
* BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP
* Balancing Act: Distribution-Guided Debiasing in Diffusion Models
* BANF: Band-Limited Neural Fields for Levels of Detail Reconstruction
* Batch Normalization Alleviates the Spectral Bias in Coordinate Networks
* BAVS: Bootstrapping Audio-Visual Segmentation by Integrating Foundation Knowledge
* Bayes' Rays: Uncertainty Quantification for Neural Radiance Fields
* Bayesian Approach to OOD Robustness in Image Classification, A
* Bayesian Differentiable Physics for Cloth Digitalization
* Bayesian Diffusion Models for 3D Shape Reconstruction
* Bayesian Exploration of Pre-Trained Models for Low-Shot Image Classification
* BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation
* Behind the Veil: Enhanced Indoor 3D Scene Reconstruction with Occluded Surfaces Completion
* BEM: Balanced and Entropy-Based Mix for Long-Tailed Semi-Supervised Learning
* Benchmark Dataset and Pair-Wise Ranking Method for Quality Evaluation of Night-Time Image Enhancement
* Benchmarking Audio Visual Segmentation for Long-Untrimmed Videos
* Benchmarking Implicit Neural Representation and Geometric Rendering in Real-Time RGB-D SLAM
* Benchmarking Object Detection Robustness against Real-World Corruptions
* Benchmarking Robustness in Neural Radiance Fields
* Benchmarking Segmentation Models with Mask-Preserved Attribute Editing
* Benchmarking the Robustness of Temporal Action Detection Models Against Temporal Corruptions
* Benchmarking Zero-Shot Recognition with Vision-Language Models: Challenges on Granularity and Specificity
* BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation
* BEVNeXt: Reviving Dense BEV Frameworks for 3D Object Detection
* BEVRefiner: Improving 3D Object Detection in Bird's-Eye-View via Dual Refinement
* BEVSpread: Spread Voxel Pooling for Bird's-Eye-View Representation in Vision-Based Roadside 3D Object Detection
* Beyond Appearances: Material Segmentation with Embedded Spectral Information from RGB-D imagery
* Beyond Average: Individualized Visual Scanpath Prediction
* Beyond Deepfake Images: Detecting AI-Generated Videos
* Beyond Fairness in Computer Vision: A Holistic Approach to Mitigating Harms and Fostering Community-Rooted Computer Vision Research
* Beyond First-Order Tweedie: Solving Inverse Problems using Latent Diffusion
* Beyond Image Super-Resolution for Image Recognition with Task-Driven Perceptual Loss
* Beyond respiratory models: a physics-enhanced synthetic data generation method for 2D-3D deformable registration
* Beyond Seen Primitive Concepts and Attribute-Object Compositional Learning
* Beyond Text: Frozen Large Language Models in Visual Signal Comprehension
* Beyond Textual Constraints: Learning Novel Diffusion Conditions with Fewer Examples
* Beyond the Premier: Assessing Action Spotting Transfer Capability Across Diverse Domains
* Beyond the Screen: Evaluating Deepfake Detectors under Moiré Pattern Effects
* BGDNet: Background-guided Indoor Panorama Depth Estimation
* Bi-Causal: Group Activity Recognition via Bidirectional Causality
* Bi-level Learning of Task-Specific Decoders for Joint Registration and One-Shot Medical Image Segmentation
* Bi-SSC: Geometric-Semantic Bidirectional Fusion for Camera-Based 3D Semantic Scene Completion
* Bidirectional Autoregressive Diffusion Model for Dance Generation
* Bidirectional Multi-Scale Implicit Neural Representations for Image Deraining
* BigEPIT: Scaling EPIT for Light Field Image Super-Resolution
* BigGait: Learning Gait Representation You Want by Large Vision Models
* Bilateral Adaptation for Human-Object Interaction Detection with Occlusion-Robustness
* Bilateral Event Mining and Complementary for Event Stream Super-Resolution
* Bilateral Propagation Network for Depth Completion
* BilevelPruning: Unified Dynamic and Static Channel Pruning for Convolutional Neural Networks
* BiMAE: A Bimodal Masked Autoencoder Architecture for Single-Label Hyperspectral Image Classification
* Binarized Low-Light Raw Video Enhancement
* Binding Touch to Everything: Learning Unified Multimodal Tactile Representations
* Bio-Optical Properties and Ocean Colour Satellite Retrieval along the Coastal Waters of the Western Iberian Coast (WIC)
* BioCLIP: A Vision Foundation Model for the Tree of Life
* Bipartite Graph-Based Projected Clustering With Local Region Guidance for Hyperspectral Imagery
* BiPer: Binary Neural Networks Using a Periodic Function
* Bitemporal Radiative Transfer Modeling Using Bitemporal 3D-Explicit Forest Reconstruction from Terrestrial Laser Scanning
* BiTT: Bi-Directional Texture Reconstruction of Interacting Two Hands from a Single Image
* BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models
* Blind CFOs Estimation by Capon Method for Multi-User MIMO-OFDMA Uplink System
* Blind Image Quality Assessment Based on Geometric Order Learning
* Blind Image Quality Assessment Based on Perceptual Comparison
* Blind Localization and Clustering of Anomalies in Textures
* Blind Video Quality Prediction by Uncovering Human Video Perceptual Representation
* Block Selective Reprogramming for On-device Training of Vision Transformers
* Blockchain Assisted Intra-Twin and Inter-Twin Authentication Scheme for Vehicular Digital Twin System
* BlockGCN: Redefine Topology Awareness for Skeleton-Based Action Recognition
* BLSAN: A Brain Lateralization-Guided Subject Adaptive Network for Motor Imagery Classification
* Blur-Aware Spatio-Temporal Sparse Transformer for Video Deblurring
* Blur2Blur: Blur Conversion for Unsupervised Image Deblurring on Unknown Domains
* Blurry-Consistency Segmentation Framework with Selective Stacking on Differential Interference Contrast 3D Breast Cancer Spheroid
* BMAD: Benchmarks for Medical Anomaly Detection
* BodyMAP: Jointly Predicting Body Mesh and 3D Applied Pressure Map for People in Bed
* Boosting Adversarial Training via Fisher-Rao Norm-Based Regularization
* Boosting Adversarial Transferability by Block Shuffle and Rotation
* Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters
* Boosting Diffusion Models with Moving Average Sampling in Frequency Domain
* Boosting Flow-based Generative Super-Resolution Models via Learned Prior
* Boosting Image Quality Assessment Through Efficient Transformer Adaptation with Local Feature Enhancement
* Boosting Image Restoration via Priors from Pre-Trained Models
* Boosting Neural Representations for Videos with a Conditional Decoder
* Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation
* Boosting Order-Preserving and Transferability for Neural Architecture Search: A Joint Architecture Refined Search and Fine-Tuning Approach
* Boosting Self-Supervision for Single-View Scene Completion via Knowledge Distillation
* Boosting Spike Camera Image Reconstruction from a Perspective of Dealing with Spike Fluctuations
* Bootstrapping Autonomous Driving Radars with Self-Supervised Learning
* Bootstrapping Chest CT Image Understanding by Distilling Knowledge from X-Ray Expert Models
* Bootstrapping SparseFormers from Vision Foundation Models
* BOP Challenge 2023 on Detection, Segmentation and Pose Estimation of Seen and Unseen Rigid Objects
* BoQ: A Place is Worth a Bag of Learnable Queries
* BOTH2Hands: Inferring 3D Hands from Both Text Prompts and Body Dynamics
* Boundary-Aware Prototype in Semi-Supervised Medical Image Segmentation
* Bracketing Image Restoration and Enhancement with High-Low Frequency Decomposition
* Brain Decodes Deep Nets
* BrainWash: A Poisoning Attack to Forget in Continual Learning
* Braking Torque Distribution Reconfiguration Strategy of Vehicle With Faults of In-Wheel Motor Drive System
* Breathing Life Into Sketches Using Text-to-Video Priors
* Bridging Actions: Generate 3D Poses and Shapes In-Between Photos
* Bridging Domains in Melanoma Diagnostics: Predicting BRAF Mutations and Sentinel Lymph Node Positivity with Attention-Based Models in Histological Images
* Bridging Remote Sensors with Multisensor Geospatial Foundation Models
* Bridging the Gap Between End-to-End and Two-Step Text Spotting
* Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection
* Bridging the Synthetic-to-Authentic Gap: Distortion-Guided Unsupervised Domain Adaptation for Blind Image Quality Assessment
* Bridging the Terrestrial Water Storage Anomalies between the GRACE/GRACE-FO Gap Using BEAST + GMDH Algorithm
* Bridging Visual and Textual Semantics: Towards Consistency for Unbiased Scene Graph Generation
* Bring Event into RGB and LiDAR: Hierarchical Visual-Motion Fusion for Scene Flow
* Brush2Prompt: Contextual Prompt Generator for Object Inpainting
* BSNet: Box-Supervised Simulation-Assisted Mean Teacher for 3D Instance Segmentation
* BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
* Building a Strong Pre-Training Baseline for Universal 3D Large-Scale Perception
* Building Bridges Across Spatial and Temporal Resolutions: Reference-Based Super-Resolution via Change Priors and Conditional Diffusion Model
* Building Optimal Neural Architectures Using Interpretable Knowledge
* Building Secure and Engaging Video Communication by Using Monitor Illumination
* Building Vision-Language Models on Solid Foundations with Masked Distillation
* Burst Image Super-Resolution with Base Frame Selection
* Byzantine-robust Decentralized Federated Learning via Dual-domain Clustering and Trust Bootstrapping
* Bézier Everywhere All at Once: Learning Drivable Lanes as Bézier Graphs
* C2KD: Bridging the Modality Gap for Cross-Modal Knowledge Distillation
* C2RV: Cross-Regional and Cross-View Learning for Sparse-View CBCT Reconstruction
* C3: High-Performance and Low-Complexity Neural Compression from a Single Image or Video
* C3Net: Compound Conditioned ControlNet for Multimodal Content Generation
* CA-Jaccard: Camera-aware Jaccard Distance for Person Re-identification
* CaBins: CLIP-based Adaptive Bins for Monocular Depth Estimation
* Cache and Reuse: Rethinking the Efficiency of On-device Transfer Learning
* Cache Me if You Can: Accelerating Diffusion Models through Block Caching
* CAD-SIGNet: CAD Language Inference from Point Clouds Using Layer-Wise Sketch Instance Guided Attention
* CAD: Photorealistic 3D Generation via Adversarial Distillation
* Cadastral-to-Agricultural: A Study on the Feasibility of Using Cadastral Parcels for Agricultural Land Parcel Delineation
* CaDeT: A Causal Disentanglement Approach for Robust Trajectory Prediction in Autonomous Driving
* CADTalk: An Algorithm and Benchmark for Semantic Commenting of CAD Programs
* CAFF-DINO: Multi-spectral object detection transformers with cross-attention features fusion
* CAGE: Circumplex Affect Guided Expression Inference
* CAGE: Controllable Articulation GEneration
* CaKDP: Category-Aware Knowledge Distillation and Pruning Framework for Lightweight 3D Object Detection
* Calibrating Higher-Order Statistics for Few-Shot Class-Incremental Learning with Pre-trained Vision Transformers
* Calibrating Multi-modal Representations: A Pursuit of Group Robustness without Annotations
* Calibration of Continual Learning Models
* Call to Reflect on Evaluation Practices for Age Estimation: Comparative Analysis of the State-of-the-Art and a Unified Benchmark, A
* CAM Back Again: Large Kernel CNNs from a Weakly Supervised Object Localization Perspective
* Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications
* CAMEL: CAusal Motion Enhancement Tailored for Lifting Text-Driven Video Editing
* Camera Motion Estimation from RGB-D-Inertial Scene Flow
* Camera-Based 3D Semantic Scene Completion With Sparse Guidance Network
* CAMixerSR: Only Details Need More Attention
* Can Biases in ImageNet Models Explain Generalization?
* Can ChatGPT Detect DeepFakes? A Study of Using Multimodal Large Language Models for Media Forensics
* Can I Trust Your Answer? Visually Grounded Video Question Answering
* Can Language Beat Numerical Regression? Language-Based Multimodal Trajectory Prediction
* Can Protective Perturbation Safeguard Personal Data from Being Exploited by Stable Diffusion?
* Can Synthetic Plant Images From Generative Models Facilitate Rare Species Identification and Classification?
* Can the accuracy bias by facial hairstyle be reduced through balancing the training data?
* Can't make an Omelette without Breaking some Eggs: Plausible Action Anticipation using Large Video-Language Models
* CAPE: CAM as a Probabilistic Ensemble for Enhanced DNN Interpretation
* CapHuman: Capture Your Moments in Parallel Universes
* CapsFusion: Rethinking Image-Text Data at Scale
* Capturing Closely Interacted Two-Person Motions with Reaction Priors
* Carbon and Energy Balance in a Primary Amazonian Forest and Its Relationship with Remote Sensing Estimates
* Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning
* CARZero: Cross-Attention Alignment for Radiology Zero-Shot Classification
* Cascading Landslide: Kinematic and Finite Element Method Analysis through Remote Sensing Techniques
* CASR: Efficient Cascade Network Structure with Channel Aligned method for 4K Real-Time Single Image Super-Resolution
* CAT-DM: Controllable Accelerated Virtual Try-On with Diffusion Model
* CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation
* CAT: Exploiting Inter-Class Dynamics for Domain Adaptive Object Detection
* Category Agnostic Model for Visual Rearrangment, A
* Category-Adaptive Label Discovery and Noise Rejection for Multi-Label Recognition With Partial Positive Labels
* Category-Aware Curriculum Learning for Data-Free Knowledge Distillation, A
* Category-Contextual Relation Encoding Network for Few-Shot Object Detection
* Category-Level Multi-Part Multi-Joint 3D Shape Assembly
* Causal Mode Multiplexer: A Novel Framework for Unbiased Multispectral Pedestrian Detection
* Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-Modal Language Models
* CausalPC: Improving the Robustness of Point Cloud Classification by Causal Effect Identification
* CBDMoE: Consistent-but-Diverse Mixture of Experts for Domain Generalization
* CCEdit: Creative and Controllable Video Editing via Diffusion Models
* CD-BASA: An Efficient Cross-Domain Batch Authentication Scheme Based on Blockchain With Accumulator for VANETs
* CDAD-Net: Bridging Domain Gaps in Generalized Category Discovery
* CDFormer: When Degradation Prediction Embraces Diffusion Model for Blind Image Super-Resolution
* CDMAD: Class-Distribution-Mismatch-Aware Debiasing for Class-Imbalanced Semi-Supervised Learning
* CenterPoint Transformer for BEV Object Detection with Automotive Radar
* Centimeter-Level Indoor Positioning With Facing Direction Detection for Microlocation-Aware Services
* Central Difference Variational Filtering Based on Conjugate Gradient Method for Distributed Imaging Application
* CFAT: Unleashing Triangular Windows for Image Super-resolution
* CFPL-FAS: Class Free Prompt Learning for Generalizable Face Anti-Spoofing
* CG-HOI: Contact-Guided 3D Human-Object Interaction Generation
* CGI-DM: Digital Copyright Authentication for Diffusion Models via Contrasting Gradient Inversion
* ChAda-ViT: Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Image
* Change in Fractional Vegetation Cover and Its Prediction during the Growing Season Based on Machine Learning in Southwest China
* Change Representation and Extraction in Stripes: Rethinking Unsupervised Hyperspectral Image Change Detection With an Untrained Network
* Changes in Anthropogenic Aerosols during the First Wave of COVID-19 Lockdowns in the Context of Long-Term Historical Trends at 51 AERONET Stations
* Changes in Vegetation Cover and the Relationship with Surface Temperature in the Cananéia-Iguape Coastal System, São Paulo, Brazil
* Channel Contrastive Attention-Based Local-Nonlocal Mutual Block on Super-Resolution, A
* Channel-Robust RF Fingerprint Identification Using Multi-Task Learning and Receiver Collaboration
* Character Position-Aware Compression Framework for Screen Text Image, A
* Characteristics Matching Based Hash Codes Generation for Efficient Fine-Grained Image Retrieval
* Characteristics of Air Traffic Flow in Terminal Airspace: A Multiplex Recurrence Network Analysis
* Characteristics of Precipitation with and without Bright Band in Summer Tibetan Plateau and Central-Eastern China, The
* Characterizing the Role of Geospatial Science in Digital Twins
* Characterizing the Supercooled Cloud over the TP Eastern Slope in 2016 via Himawari-8 Products
* Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs
* Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
* ChatPose: Chatting about 3D Human Pose
* ChatScene: Knowledge-Enabled Safety-Critical Scenario Generation for Autonomous Vehicles
* ChatVTG: Video Temporal Grounding via Chat with Video Dialogue Large Language Models
* Check, Locate, Rectify: A Training-Free Layout Calibration System for Text- to- Image Generation
* China Coastal Front from Himawari-8 AHI SST Data: Part 2: South China Sea, The
* Choose What You Need: Disentangled Representation Learning for Scene Text Recognition, Removal and Editing
* Chromapose: Robustness of 2d Pose Estimation Under Different Color Illuminations
* Cinematic Behavior Transfer via NeRF-based Differentiable Filming
* circ CHAIN: Enhancing Generalization in Data-Efficient GANs via LipsCHitz Continuity ConstrAIned Normalization
* Circuit Design and Efficient Simulation of Quantum Inner Product and Empirical Studies of Its Effect on Near-Term Hybrid Quantum-Classic Machine Learning
* CityDreamer: Compositional Generative Model of Unbounded 3D Cities
* CityLLaVA: Efficient Fine-Tuning for VLMs in City Scenario
* Class Incremental Learning with Multi-Teacher Distillation
* Class Similarity Transition: Decoupling Class Similarities and Imbalance from Generalized Few-shot Segmentation
* Class Tokens Infusion for Weakly Supervised Semantic Segmentation
* Class-Incremental Mixture of Gaussians for Deep Continual Learning
* Class-Specific Thresholding for Imbalanced Semi-Supervised Learning
* Classes Are Not Equal: An Empirical Study on Image Recognition Fairness
* Classification and Mapping of Fuels in Mediterranean Forest Landscapes Using a UAV-LiDAR System and Integration Possibilities with Handheld Mobile Laser Scanner Systems
* Classification of 2D Ultrasound Breast Cancer Images with Deep Learning
* Classification of Lung Nodules on Ct via Pseudo-colour Images and Deep Features from Pre-trained Convolutional Networks
* Classification of Small Targets on Sea Surface Based on Improved Residual Fusion Network and Complex Time-Frequency Spectra
* Classifier Guided Cluster Density Reduction for Dataset Selection
* CLIB-FIQA: Face Image Quality Assessment with Confidence Calibration
* CLiC: Concept Learning in Context
* Click, Crop & Detect: One-Click Offline Annotation for Human-in-the-Loop 3D Object Detection on Point Clouds
* CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor
* CLIP-BEVFormer: Enhancing Multi-View Image-Based BEV Detector with Ground Truth Flow
* CLIP-Driven Open-Vocabulary 3D Scene Graph Generation via Cross-Modality Contrastive Learning
* CLIP-KD: An Empirical Study of CLIP Model Distillation
* CLIPtone: Unsupervised Learning for Text-Based Image Tone Adjustment
* CLOAF: CoLlisiOn-Aware Human Flow
* Clockwork Diffusion: Efficient Generation With Model-Step Distillation
* Close Imitation of Expert Retouching for Black-and-White Photography
* Closed-Form, Pairwise Solution to Local Non-Rigid Structure-From-Motion, A
* Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption
* Closer Look at Spatial-Slice Features Learning for COVID-19 Detection, A
* Closer Look at the Few-Shot Adaptation of Large Vision-Language Models, A
* Cloud-Device Collaborative Learning for Multimodal Large Language Models
* CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update
* Cluster Self-Refinement for Enhanced Online Multi-Camera People Tracking
* Cluster Triplet Loss for Unsupervised Domain Adaptation on Histology Images
* Cluster-Based Wood-Leaf Separation Method for Forest Plots Using Terrestrial Laser Scanning Data
* Clustering for Protein Representation Learning
* Clustering Propagation for Universal Medical Image Segmentation
* CMA: A Chromaticity Map Adapter for Robust Detection of Screen-Recapture Document Images
* CMOSE: Comprehensive Multi-Modality Online Student Engagement Dataset with High-Quality Labels
* CMVDE: Consistent Multi-View Video Depth Estimation via Geometric-Temporal Coupling Approach
* CN-RMA: Combined Network with Ray Marching Aggregation for 3D Indoor Object Detection from Multi-View Images
* CNC-Net: Self-Supervised Learning for CNC Machining Operations
* CNN Classification of Computed Tomography Images for Pancreatic Tumor Detection
* Co-designing a Sub-millisecond Latency Event-based Eye Tracking System with Submanifold Sparse CNN
* Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model
* Coalitional Game-Theoretic Paradigm for Power Allocation in Distributed Antenna Systems
* Coarse or Fine? Recognising Action End States without Labels
* Coarse-to-Fine Deep Learning Based Framework for Traffic Light Recognition, A
* Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis
* Coarse-to-fine Two-stage Helmet Detection Method for Motorcyclists, A
* Coastal Sea Ice Concentration Derived from Marine Radar Images: A Case Study from Utqiagvik, Alaska
* Coastal Storm-Induced Sinkholes: Insights from Unmanned Aerial Vehicle Monitoring
* Coastal Vulnerability Index (CVI) Assessment: Evaluating Risks Associated with Human-Made Activities along the Limassol Coastline, Cyprus
* CoBEV: Elevating Roadside 3D Object Detection With Depth and Height Complementarity
* COCONut: Modernizing COCO Segmentation
* CoDe: An Explicit Content Decoupling Framework for Image Restoration
* Codebook Transfer with Part-of-Speech for Vector-Quantized Image Modeling
* Codebook VQ-VAE Approach for Prostate Cancer Diagnosis using Multiparametric MRI
* CodedBGT: Code Bank-Guided Transformer for Low-Light Image Enhancement
* CodedEvents: Optimal Point-Spread-Function Engineering for 3D-Tracking with Event Cameras
* CoDeF: Content Deformation Fields for Temporally Consistent Video Processing
* CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation
* CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation
* Coding self-representative and label-relaxed hashing for cross-modal retrieval
* CoDISP: Exploring Compressed Domain Camera ISP with RGB-guided Encoder
* CoG-DQA: Chain-of-Guiding Learning with Large Language Models for Diagram Question Answering
* CogAgent: A Visual Language Model for GUI Agents
* CoGS: Controllable Gaussian Splatting
* Coherence as Texture: Passive Textureless 3D Reconstruction by Self-Interference
* Coherent Temporal Synthesis for Incremental Action Segmentation
* CoLa-SDF: Controllable Latent StyleSDF for Disentangled 3D Face Generation
* Collaborating Foundation Models for Domain Generalized Semantic Segmentation
* Collaborative Blind Image Deblurring
* Collaborative Learning of Anomalies with Privacy (CLAP) for Unsupervised Video Anomaly Detection: A New Baseline
* Collaborative Semantic Occupancy Prediction with Hybrid Feature Fusion in Connected Automated Vehicles
* Collaborative Visual Place Recognition through Federated Learning
* Collective Matrix Completion via Graph Extraction
* Collision-Free Formation Control for Heterogeneous Multiagent Systems Under DoS Attacks
* COLMAP-Free 3D Gaussian Splatting
* Color Difference in Context: an Experiment
* Color Shift Estimation-and-Correction for Image Enhancement
* Color-cued Efficient Densification Method for 3D Gaussian Splatting
* ColorPCR: Color Point Cloud Registration with Multi-Stage Geometric-Color Fusion
* Combining Frame and GOP Embeddings for Neural Video Representation
* Combining KAN with CNN: KonvNeXt's Performance in Remote Sensing and Patent Insights
* Common Canvas: Open Diffusion Models Trained on Creative-Commons Images
* Commonsense Prototype for Outdoor Unsupervised 3D Object Detection
* Communication-Efficient Collaborative Perception via Information Filling with Codebook
* Communication-Efficient Federated Learning with Accelerated Client Gradient
* Commutative Encryption and Reversible Watermarking Algorithm for Vector Maps Based on Virtual Coordinates
* Compact 3D Gaussian Representation for Radiance Field
* Comparative Analysis and Optimal Selection of Calibration Functions in Pure Rotational Raman Lidar Technique
* Comparative Analysis of Driver Overtaking Behavior Near Low-Speed Automated Vehicles and Human-Driven Vehicles
* Comparative Analysis of Generalization and Harmonization Methods for 3D Brain fMRI Images: A Case Study on OpenBHB Dataset
* Comparative Analysis of Implicit Augmentation Techniques for Breast Cancer Diagnosis Using Multiple Views, A
* Comparative Study on the Vertical Column Concentration Inversion Algorithm of Tropospheric Trace Gas Based on the MAX-DOAS Measurement Spectrum
* Compare and Focus: Multi-Scale View Aggregation for Crowd Counting
* Comparing Link Budget Requirements for Future Space-Based Interferometers
* Comparing the Decision-Making Mechanisms by Transformers and CNNs via Explanation Methods
* Comparison and Analysis of Three Methods for Dynamic Height Error Correction in GNSS-IR Sea Level Retrievals
* Comparison and Optimization of Light Use Efficiency-Based Gross Primary Productivity Models in an Agroforestry Orchard
* Comparison of Soil Water Content from SCATSAR-SWI and Cosmic Ray Neutron Sensing at Four Agricultural Sites in Northern Italy: Insights from Spatial Variability and Representativeness
* Comparison of the Morrison and WDM6 Microphysics Schemes in the WRF Model for a Convective Precipitation Event in Guangdong, China, Through the Analysis of Polarimetric Radar Data
* Comparison of Time-Lapse Ground-Penetrating Radar and Electrical Resistivity Tomography Surveys for Detecting Pig (Sus spp.) Cadaver Graves in an Australian Environment
* Compensating Acquisition Footprint for Amplitude-Preserving Angle Domain Common Image Gathers Based on 3D Reverse Time Migration
* Complementing Event Streams and RGB Frames for Hand Mesh Reconstruction
* Complex Background SAR Ship Target Detection Method Based on Fusion Tensor and Cross-Domain Adversarial Learning, A
* Complex Discontinuity Structure Beneath the Changbaishan-Tianchi Volcano Revealed by the P-Wave Coda Autocorrelation Method Based on Dense Seismic Array
* Complex Permittivity of Adobe Verses Frequency and Water Content
* Complex Style Image Transformations for Domain Generalization in Medical Images
* Composed Video Retrieval via Enriched Context and Discriminative Embeddings
* Composing Object Relations and Attributes for Image-Text Matching
* Compositional Chain-of-Thought Prompting for Large Multimodal Models
* Compositional Video Understanding with Spatiotemporal Structure-based Transformers
* Comprehensive Analysis of Factors Impacting Membership Inference, A
* Comprehensive Comparison of Far-Field and Near-Field Imaging Radiometry in Synthetic Aperture Interferometry, A
* Compressed 3D Gaussian Splatting for Accelerated Novel View Synthesis
* Compressed Line Spectral Estimation Using Covariance: A Sparse Reconstruction Perspective
* Computational Spectral Imaging with Unified Encoding Model and Beyond
* Computationally Efficient Approach for Acquisition and Doppler Tracking for PNT With LEO Megaconstellations, A
* Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models
* ConceptHash: Interpretable Fine-Grained Hashing via Concept Discovery
* ConCon-Chi: Concept-Context Chimera Benchmark for Personalized Vision-Language Tasks
* CONDA: Continual Unsupervised Domain Adaptation Learning in Visual Perception for Self-Driving Cars
* Condition-Aware Neural Network for Controlled Image Generation
* Conditional Denoising Diffusion Probabilistic Model for Point Cloud Upsampling, A
* Confidence-Aware RGB-D Face Recognition via Virtual Depth Synthesis
* CONFORM: Contrast is All You Need For High-Fidelity Text-to-Image Diffusion Models
* Conformal Semantic Image Segmentation: Post-hoc Quantification of Predictive Uncertainty
* Confronting Ambiguity in 6D Object Pose Estimation via Score-Based Diffusion on SE(3)
* Connecting NeRFs, Images, and Text
* ConPro: Learning Severity Representation for Medical Images using Contrastive Learning and Preference Optimization
* Consideration of Human Vision in Crowd Simulations
* Considering the Effects of Horizontal Heterogeneities in Satellite-Based Large-Scale Statistics of Cloud Optical Properties
* ConsistDreamer: 3D-Consistent 2D Diffusion for High-Fidelity Scene Editing
* Consistency and Uncertainty: Identifying Unreliable Responses From Black-Box Vision-Language Models for Selective Visual Question Answering
* Consistent GT-Proposal Assignment for Challenging Pedestrian Detection
* Consistent Prompting for Rehearsal-Free Continual Learning
* Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior
* ConsistNet: Enforcing 3D Consistency for Multi-View Images Diffusion
* Constrained 3-D Trajectory Planning for Aerial Vehicles Without Range Measurement
* Constrained Layout Generation with Factor Graphs
* Construct to Associate: Cooperative Context Learning for Domain Adaptive Point Cloud Segmentation
* Constructing and Exploring Intermediate Domains in Mixed Domain Semi-supervised Medical Image Segmentation
* Construction and Inference Method of Semantic-Driven, Spatio-Temporal Derivation Relationship Network for Place Names
* Consumer-Oriented Image Transformation Scheme With a Secret Key for Privacy Protection, A
* Content-Adaptive Non-Local Convolution for Remote Sensing Pansharpening
* Content-aware Input Scaling and Deep Learning Computation Offloading for Low-Latency Embedded Vision
* Content-Style Decoupling for Unsupervised Makeup Transfer without Generating Pseudo Ground Truth
* ConTex-Human: Free-View Rendering of Human from a Single Image with Texture-Consistent Synthesis
* Context-Aware DGCN-Based Ship Formation Recognition in Remote Sensing Images
* Context-Aware Integration of Language and Visual References for Natural Language Tracking
* Context-aware Video Anomaly Detection in Long-Term Datasets
* Context-Based and Diversity-Driven Specificity in Compositional Zero-Shot Learning
* Context-Guided Spatio-Temporal Video Grounding
* ContextMatcher: Detector-Free Feature Matching With Cross-Modality Context
* Contextrast: Contextual Contrastive Learning for Semantic Segmentation
* ContextSeg: Sketch Semantic Segmentation by Querying the Context with Attention
* Contextual Augmented Global Contrast for Multimodal Intent Recognition
* Contextualising Implicit Representations for Semantic Tasks
* Continual Cross-Domain Image Compression via Entropy Prior Guided Knowledge Distillation and Scalable Decoding
* Continual Diffusion with STAMINA: STack-And-Mask INcremental Adapters
* Continual Forgetting for Pre-Trained Vision Models
* Continual Learning for Motion Prediction Model via Meta-Representation Learning and Optimal Memory Buffer Retention Strategy
* Continual Learning with Weight Interpolation
* Continual Segmentation with Disentangled Objectness Learning and Class Recognition
* Continual Self-Supervised Learning: Towards Universal Multi-Modal Medical Data Representation Learning
* Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation
* Continual-Zoo: Leveraging Zoo Models for Continual Classification of Medical Images
* Continuous Monitoring of Forests in Wetland Ecosystems with Remote Sensing and Probability Sampling
* Continuous Optical Zooming: A Benchmark for Arbitrary-Scale Image Super-Resolution in Real World
* Continuous Pose for Monocular Cameras in Neural Implicit Representation
* Contraction mapping of feature norms for data quality imbalance learning
* Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Compositional Understanding
* Contrastive Clothing and Pose Generation for Cloth-Changing Person Re-Identification
* Contrastive Denoising Score for Text-Guided Latent Diffusion Image Editing
* Contrastive Learning for DeepFake Classification and Localization via Multi-Label Ranking
* Contrastive Learning for Lane Detection via cross-similarity
* Contrastive Mean-Shift Learning for Generalized Category Discovery
* Contrastive Open-Set Active Learning-Based Sample Selection for Image Classification
* Contrastive Pedestrian Attentive and Correlation Learning Network for Occluded Person Re-Identification
* Contrastive Pre-Training with Multi-View Fusion for No-Reference Point Cloud Quality Assessment
* Contrastive Pretraining for Visual Concept Explanations of Socioeconomic Outcomes
* Contrastive Transformer Network for Track Segment Association with Two-Stage Online Method
* Contribution of Climatic Change and Human Activities to Vegetation Dynamics over Southwest China during 2000-2020
* Control4D: Efficient 4D Portrait Editing With Text
* ControlPolypNet: Towards Controlled Colon Polyp Synthesis for Improved Polyp Segmentation
* ControlRoom3D: Room Generation Using Semantic Proxy Rooms
* Conv-Adapter: Exploring Parameter Efficient Transfer Learning for ConvNets
* Conversational Short-Phrase Speaker Diarization via Self-Adjusting Speech Segmentation and Embedding Extraction
* Converting Urban Trips to Multi-Dimensional Signals to Improve Trip Purpose Inference
* Convex Hull Prediction for Adaptive Video Streaming by Recurrent Learning
* ConvNet-HIDE: Deep-Learning-Based Dual Watermarking for Health-Care Images
* ConvoFusion: Multi-Modal Conversational Diffusion for Co-Speech Gesture Synthesis
* Convolutional Neural Networks (CNN) with Quantum-Behaved Particle Swarm Optimization (QPSO)-Based Medical Image Fusion
* Convolutional Prompting meets Language Models for Continual Learning
* COOD: Combined out-of-distribution detection using multiple measures for anomaly & novel class detection in large-scale hierarchical classification
* Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation
* Cooperative Localization of Asynchronous AUVs With Compensation for the Acoustic Wave Bends
* Coordination Graph Based Framework for Network Traffic Signal Control, A
* CoralSCOP: Segment any COral Image on this Planet
* CORE-MPI: Consistency Object Removal with Embedding MultiPlane Image
* CORES: Convolutional Response-based Score for Out-of-distribution Detection
* Coreset Selection for Object Detection
* Correcting Diffusion Generation Through Resampling
* Correlation-aware Coarse-to-fine MLPs for Deformable Medical Image Registration
* Correlation-Decoupled Knowledge Distillation for Multimodal Sentiment Analysis with Incomplete Modalities
* Correspondence-Free Non-Rigid Point Set Registration Using Unsupervised Clustering Analysis
* CorrMatch: Label Propagation via Correlation Matching for Semi-Supervised Semantic Segmentation
* CorrTalk: Correlation Between Hierarchical Speech and Facial Activity Variances for 3D Animation
* Cosalpure: Learning Concept from Group Images for Robust Co-Saliency Detection
* CoSeR: Bridging Image and Language for Cognitive Super-Resolution
* CosmicMan: A Text-to-Image Foundation Model for Humans
* COTR: Compact Occupancy TRansformer for Vision-Based 3D Occupancy Prediction
* Countering Personalized Text-to-Image Generation with Influence Watermarks
* County-Level Cultivated Land Quality Evaluation Using Multi-Temporal Remote Sensing and Machine Learning Models: From the Perspective of National Standard
* Coupled Laplacian Eigenmaps for Locally-Aware 3D Rigid Point Cloud Matching
* Coupling Light Intensity and Hyperspectral Reflectance Improve Estimations of the Actual Electron Transport Rate of Mango Leaves (Mangifera indica L.)
* COVER: A Comprehensive Video Quality Evaluator
* CPGA: Coding Priors-Guided Aggregation Network for Compressed Video Quality Enhancement
* CPI-Parser: Integrating Causal Properties Into Multiple Human Parsing
* CPINet: Towards A Novel Cross-Polarimetric Interaction Network for Dual-Polarized SAR Ship Classification
* CPLIP: Zero-Shot Learning for Histopathology with Comprehensive Vision-Language Alignment
* CPP-Net: Embracing Multi-Scale Feature Fusion into Deep Unfolding CP-PPA Network for Compressive Sensing
* CPR-Coach: Recognizing Composite Error Actions Based on Single-Class Training
* CPR: Retrieval Augmented Generation for Copyright Protection
* CRAUnet++: A New Convolutional Neural Network for Land Surface Water Extraction from Sentinel-2 Imagery by Combining RWI with Improved Unet++
* Creating a Digital Twin of Spinal Surgery: A Proof of Concept
* CricaVPR: Cross-Image Correlation-Aware Representation Learning for Visual Place Recognition
* Critical Review of Subway Train Timetabling and Rescheduling Problems, A
* CRKD: Enhanced Camera-Radar Object Detection with Cross-Modality Knowledge Distillation
* CRNet: A Detail-Preserving Network for Unified Image Restoration and Enhancement Task
* CroSel: Cross Selection of Confident Pseudo Labels for Partial-Label Learning
* CroSpace6D: Leveraging Geometric and Motion Cues for High-Precision Cross-Domain 6DoF Pose Estimation for Non-Cooperative Spacecrafts
* Cross Initialization for Face Personalization of Text-to-Image Models
* Cross-Attention Regression Flow for Defect Detection
* Cross-Dataset Study for Text-based 3D Human Motion Retrieval, A
* Cross-dimension Affinity Distillation for 3D EM Neuron Segmentation
* Cross-Domain Few-Shot Segmentation via Iterative Support-Query Correspondence Mining
* Cross-Domain Synthetic-to-Real In-the-Wild Depth and Normal Estimation for 3D Scene Understanding
* Cross-Modal Feature Fusion-Based Knowledge Transfer for Text-Based Person Search
* Cross-Modal Fusion and Attention Mechanism for Weakly Supervised Video Anomaly Detection
* Cross-Modal Quantization for Co-Speech Gesture Generation
* Cross-Modal Self-Training: Aligning Images and Pointclouds to learn Classification without Labels
* Cross-Modality Vessel Re-Identification With Deep Alignment Decomposition Network
* Cross-Scale Attention for Long-Term Time Series Forecasting
* Cross-sensor super-resolution of irregularly sampled Sentinel-2 time series
* Cross-spectral Gated-RGB Stereo Depth Estimation
* Cross-Subject EEG Feedback for Implicit Image Generation
* Cross-Temporal Spectrogram Autoencoder (CTSAE): Unsupervised Dimensionality Reduction for Clustering Gravitational Wave Glitches
* Cross-view Aggregation Network For Stereo Image Super-Resolution
* Cross-View and Cross-Pose Completion for 3D Human Understanding
* CrossDiff: Exploring Self-SupervisedRepresentation of Pansharpening via Cross-Predictive Diffusion Model
* CrossKD: Cross-Head Knowledge Distillation for Object Detection
* CrossMAE: Cross-Modality Masked Autoencoders for Region-Aware Audio-Visual Pre-Training
* Crossmodal Translation Based Meta Weight Adaption for Robust Image-Text Sentiment Analysis
* CrowdDiff: Multi-Hypothesis Crowd Density Estimation Using Diffusion Models
* CSCO: Connectivity Search of Convolutional Operators
* CSMNER: A Toponym Entity Recognition Model for Chinese Social Media
* CSTA: CNN-based Spatiotemporal Attention for Video Summarization
* CUE-Net: Violence Detection Video Analytics with Spatial Cropping, Enhanced UniformerV2 and Modified Efficient Additive Attention
* Cultural Heritage in Times of Crisis: Damage Assessment in Urban Areas of Ukraine Using Sentinel-1 SAR Data
* Cumulative and Lagged Effects: Seasonal Characteristics of Drought Effects on East Asian Grasslands
* Cumulative Rainfall Radar Recalibration with Rain Gauge Data Using the Colour Pattern Regression Algorithm QGIS Plugin
* Current Status of Remote Sensing for Studying the Impacts of Hurricanes on Mangrove Forests in the Coastal United States
* Curriculum Learning Driven Domain Adaptation for Low-Resource Machine Reading Comprehension
* Curriculum Point Prompting for Weakly-Supervised Referring Image Segmentation
* CURSOR: Scalable Mixed-Order Hypergraph Matching with CUR Decomposition
* CurveCloudNet: Processing Point Clouds with 1D Structure
* Customization Assistant for Text-to-image Generation
* Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training
* CustomListener: Text-Guided Responsive Interaction for User-Friendly Listening Head Generation
* CuVLER: Enhanced Unsupervised Object Discoveries through Exhaustive Self-Supervised Transformers
* CVT-xRF: Contrastive In-Voxel Transformer for 3D Consistent Radiance Fields from Sparse Inputs
* CWSCNet: Channel-Weighted Skip Connection Network for Underwater Object Detection
* CyberDemo: Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation
* CycleGANAS: Differentiable Neural Architecture Search for CycleGAN
* CycleINR: Cycle Implicit Neural Representation for Arbitrary-Scale Volumetric Super-Resolution of Medical Data
* Cyclic Learning for Binaural Audio Generation and Localization
* D3still: Decoupled Differential Distillation for Asymmetric Image Retrieval
* D3T: Distinctive Dual-Domain Teacher Zigzagging Across RGB-Thermal Gap for Domain-Adaptive Object Detection
* D4M: Dataset Distillation via Disentangled Diffusion Model
* DaFF: Dual Attentive Feature Fusion for Multispectral Pedestrian Detection
* Damage Detection and Localization by Learning Deep Features of Elastic Waves in Piezoelectric Ceramic Using Point Contact Method
* Damage Scene Change Detection Based on Infrared Polarization Imaging and Fast-PCANet
* DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance
* DanceComposer: Dance-to-Music Generation Using a Progressive Conditional Music Generator
* Dancing with Still Images: Video Distillation via Static-Dynamic Disentanglement
* DAP: A Dynamic Adversarial Patch for Evading Person Detectors
* DaReNeRF: Direction-aware Representation for Dynamic Scenes
* DART: Implicit Doppler Tomography for Radar Novel View Synthesis
* Data Poisoning Based Backdoor Attacks to Contrastive Learning
* Data Valuation and Detections in Federated Learning
* Data-Driven Calibration of SWOT's Systematic Errors: First In-Flight Assessment
* Data-Driven Cooperative Differential Game Based Energy Management Strategy for Hybrid Electric Propulsion System of a Flying Car
* Data-Driven Image-Based Visual Servoing Scheme for Redundant Manipulators With Unknown Structure and Singularity Solution, A
* Data-Efficient and Robust Task Selection for Meta-Learning
* Data-Efficient Multimodal Fusion on a Single GPU
* Data-Efficient Unsupervised Interpolation Without Any Intermediate Frame for 4D Medical Images
* Data-free Defense of Black Box Models Against Adversarial Attacks
* Data-free Model Fusion with Generator Assistants
* Data-Free Quantization via Pseudo-label Filtering
* Dataset condensation with latent quantile matching
* DAVE: A Detect-and-Verify Paradigm for Low-Shot Counting
* Day-Night Cross-domain Vehicle Re-identification
* DCDR-UNet: Deformable Convolution Based Detail Restoration via U-shape Network for Single Image HDR Reconstruction
* DCE-diff: Diffusion Model for Synthesis of Early and Late Dynamic Contrast-Enhanced MR Images from Non-Contrast Multimodal Inputs
* DDOS: The Drone Depth and Obstacle Segmentation Dataset
* De-Confounded Data-Free Knowledge Distillation for Handling Distribution Shifts
* De-Diffusion Makes Text a Strong Cross-Modal Interface
* De-noised Vision-language Fusion Guided by Visual Cues for E-commerce Product Search
* DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations
* DECA-Net: Dual encoder and cross-attention fusion network for surgical instrument segmentation
* Decentralized Directed Collaboration for Personalized Federated Learning
* Deciphering What and Where Visual Pathways from Spectral Clustering of Layer-Distributed Neural Representations
* DECNet: A Non-Contacting Dual-Modality Emotion Classification Network for Driver Health Monitoring
* Decompose-and-Compose: A Compositional Approach to Mitigating Spurious Correlation
* Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Pre-Training Framework
* Decomposition-Based Multiobjective Evolutionary Optimization With Tabu Search for Dynamic Pickup and Delivery Problems
* DeconfuseTrack: Dealing with Confusion for Multi-Object Tracking
* DeCoTR: Enhancing Depth Completion with 2D and 3D Attentions
* Decouple Graph Neural Networks: Train Multiple Simple GNNs Simultaneously Instead of One
* Decoupled Pseudo-Labeling for Semi-Supervised Monocular 3D Object Detection
* Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation
* Dedicated Inference Engine and Binary-Weight Neural Networks for Lightweight Instance Segmentation
* DeDoDe v2: Analyzing and Improving the DeDoDe Keypoint Detector
* Deep Biclustering Framework for Brain Network Analysis, A
* Deep Equilibrium Diffusion Restoration with Parallel Sampling
* Deep Generative Data Assimilation in Multimodal Setting
* Deep Generative Model based Rate-Distortion for Image Downscaling Assessment
* Deep Hybrid Fusion Network for Inverse Synthetic Aperture Radar Ship Target Recognition Using Multi-Domain High-Resolution Range Profile Data
* Deep Imbalanced Regression via Hierarchical Classification Adjustment
* Deep Learning Approach for Driver Speed Intention Recognition Based on Naturalistic Driving Data
* Deep learning approach to pedestrian detection and path prediction
* Deep Learning Era for Computer Vision-Based Eye Gaze Tracking: An Intensive Model
* Deep Learning for Table Detection and Structure Recognition: A Survey
* Deep Learning Methods for Calibrated Photometric Stereo and Beyond
* Deep Learning-Based Identification of Arctic Ocean Boundaries and Near-Surface Phenomena in Underwater Echograms
* Deep NRSFM for multi-view multi-body pose estimation
* Deep Portrait Quality Assessment. A NTIRE 2024 Challenge Survey
* Deep RAW Image Super-Resolution. A NTIRE 2024 Challenge Survey
* Deep Reinforcement Learning-Based Adaptive Computation Offloading and Power Allocation in Vehicular Edge Computing Networks
* Deep Single Image Camera Calibration by Heatmap Regression to Recover Fisheye Images Under Manhattan World Assumption
* Deep Tectonic Environment Analysis of the Lingshan Conjugate Earthquake within the Qinzhou Fold Belt, South China: Insights Derived from 3D Resistivity Structure Model
* Deep Unpaired Blind Image Super-Resolution Using Self-supervised Learning and Exemplar Distillation
* Deep Variational Network Toward Blind Image Restoration
* Deep Video Codec Control for Vision Models
* Deep Video Inverse Tone Mapping Based on Temporal Clues
* Deep-Learning Approach to Detect and Classify Heavy-Duty Trucks in Satellite Images, A
* Deep-Learning Network for Wheat Yield Prediction Combining Weather Forecasts and Remote Sensing Data, A
* Deep-TROJ: An Inference Stage Trojan Insertion Algorithm Through Efficient Weight Replacement Attack
* DeepCache: Accelerating Diffusion Models for Free
* DeepDistAL: Deepfake Dataset Distillation using Active Learning
* Deepfake Catcher: Can a Simple Fusion be Effective and Outperform Complex DNNs?
* DeepLocalization: Using change point detection for Temporal Action Localization
* DeepMesh: Differentiable Iso-Surface Extraction
* DeepVCA: Deep Video Complexity Analyzer
* Defense Against Adversarial Attacks on No-Reference Image Quality Models with Gradient Norm Regularization
* Defense without Forgetting: Continual Adversarial Defense with Anisotropic & Isotropic Pseudo Replay
* Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction
* Deformable One-Shot Face Stylization via DINO Semantic Guidance
* Degrees of Freedom Matter: Inferring Dynamics from Point Trajectories
* DehazeDCT: Towards Effective Non-Homogeneous Dehazing via Deformable Convolutional Transformer
* DeIL: Direct-and-Inverse CLIP for Open-World Few-Shot Learning
* DeiT-LT: Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets
* DELTA: Decoupling Long-Tailed Online Continual Learning
* Delving Into Important Samples of Semi-Supervised Old Photo Restoration: A New Dataset and Method
* Delving into the Trajectory Long-tail Distribution for Muti-object Tracking
* DeMatch: Deep Decomposition of Motion Field for Two-View Correspondence Learning
* DemoCaricature: Democratising Caricature Generation with a Rough Sketch
* DemoFusion: Democratising High-Resolution Image Generation With No $$
* Demographic Bias Effects on Face Image Synthesis
* DemosaicFormer: Coarse-to-Fine Demosaicing Network for HybridEVS Camera
* Dempster-Shafer Enhanced Framework for Urban Road Planning Using a Model-Based Digital Twin and MCDM Techniques, A
* Denoising of Photon-Counting LiDAR Bathymetry Based on Adaptive Variable OPTICS Model and Its Accuracy Assessment
* Denoising Point Clouds in Latent Space via Graph Convolution and Invertible Neural Network
* Dense Optical Tracking: Connecting the Dots
* Dense Vision Transformer Compression with Few Samples
* Density-Adaptive Model Based on Motif Matrix for Multi-Agent Trajectory Prediction
* Density-Guided Semi-Supervised 3D Semantic Segmentation with Dual-Space Hardness Sampling
* Density-guided Translator Boosts Synthetic-to-Real Unsupervised Domain Adaptive Segmentation of 3D Point Clouds
* Deploying Machine Learning Anomaly Detection Models to Flight Ready AI Boards
* DepressionMLP: A Multi-Layer Perceptron Architecture for Automatic Depression Level Prediction via Facial Keypoints and Action Units
* DePT: Decoupled Prompt Tuning
* Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
* Depth Information Assisted Collaborative Mutual Promotion Network for Single Image Dehazing
* Depth Prompting for Sensor-Agnostic Depth Estimation
* Depth-Aware Concealed Crop Detection in Dense Agricultural Scenes
* Depth-Aware Test-Time Training for Zero-Shot Video Object Segmentation
* Depth-Guided Robust Point Cloud Fusion NeRF for Sparse Input Views
* Depth-Regularized Optimization for 3D Gaussian Splatting in Few-Shot Images
* DepthVoting: A Few-Shot Point Cloud Classification Model Incorporating a Projection-Based Voting Mechanism
* Describing Differences in Image Sets with Natural Language
* Descriptor and Word Soups Q: Overcoming the Parameter Efficiency Accuracy Tradeoff for Out-of-Distribution Few-shot Learning
* Desertification Mitigation in Northern China Was Promoted by Climate Drivers after 2000
* Desigen: A Pipeline for Controllable Design Template Generation
* Design and Analysis of a Moon-Based Earth-Radiation Measurement System
* Design and Analysis of Efficient Attention in Transformers for Social Group Activity Recognition
* Design of Intelligent Control Under Machine Learning Supervision and Signal Compression Mechanism Design for NCSs Under DoS Attacks
* Design2Cloth: 3D Cloth Generation from 2D Masks
* Designing High Speed Weigh-in-Motion System With Principal Component Regression in Wallonia (Belgium) Toward Direct Weight Enforcement
* DetCLIPv3: Towards Versatile Generative Open-Vocabulary Object Detection
* DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception
* Detecting Hailstorms in China from FY-4A Satellite with an Ensemble Machine Learning Model
* Detecting Out-Of-Distribution Earth Observation Images with Diffusion Models
* Detecting Traffic Anomalies During Extreme Events via a Temporal Self-Expressive Model
* Detection and Statistics System of Pavement Distresses Based on Street View Videos
* Detection of Maize Seedling Quality from UAV Images Based on Deep Learning and Voronoi Diagram Algorithms, The
* Detection of Wet Snow by Weakly Supervised Deep Learning Change Detection Algorithm with Sentinel-1 Data
* Detector-Free Structure from Motion
* Determinants of Intra-City Residential Migration Patterns of Older Adults: A GIS and Decision Tree Analysis of Yancheng City, China
* Determination of Microtopography of Low-Relief Tidal Freshwater Forested Wetlands Using LiDAR
* Determination of Optimum Coordinate Transformation Parameters for GNSS and LiDAR-Based Localization in Automated Vehicles
* Detours for Navigating Instructional Videos
* DETR-ORD: An Improved DETR Detector for Oriented Remote Sensing Object Detection with Feature Reconstruction and Dynamic Query
* DETRs Beat YOLOs on Real-time Object Detection
* Development of a Background Filtering Algorithm to Improve the Accuracy of Determining Underground Cavities Using Multi-Channel Ground-Penetrating Radar and Deep Learning
* Development of a UAS-Based Multi-Sensor Deep Learning Model for Predicting Napa Cabbage Fresh Weight and Determining Optimal Harvest Time
* Device-Wise Federated Network Pruning
* devil is in discretization discrepancy. Robustifying Differentiable NAS with Single-Stage Searching Protocol, The
* Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing, The
* Devil is in the Fine-Grained Details: Evaluating open-Vocabulary Object Detectors for Fine-Grained Understanding, The
* Dexterous Grasp Transformer
* DFIE3D: 3D-Aware Disentangled Face Inversion and Editing via Facial-Contrastive Learning
* Dformer: Learning Efficient Image Restoration with Perceptual Guidance
* DGBD: Depth Guided Branched Diffusion for Comprehensive Controllability in Multi-View Generation
* DGC-GNN: Leveraging Geometry and Color Cues for Visual Descriptor-Free 2D-3D Matching
* DGPINet-KD: Deep Guided and Progressive Integration Network With Knowledge Distillation for RGB-D Indoor Scene Analysis
* DIA: Diffusion based Inverse Network Attack on Collaborative Inference
* Diabetic Retinopathy (DR) Image Synthesis Using DCGAN and Classification of DR Using Transfer Learning Approaches
* Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation
* DiaLoc: An Iterative Approach to Embodied Dialog Localization
* DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement
* DiCo-NeRF: Difference of Cosine Similarity for Neural Rendering of Fisheye Driving Scenes
* DIEM: Decomposition-Integration Enhancing Multimodal Insights
* Diff-BGM: A Diffusion Model for Video Background Music Generation
* Diff-Plugin: Revitalizing Details for Diffusion-Based Low-Level Tasks
* DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model
* DiffAM: Diffusion-Based Adversarial Makeup Transfer for Facial Privacy Protection
* DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly
* DiffAvatar: Simulation-Ready Garment Optimization with Differentiable Simulation
* DiffCast: A Unified Framework via Residual Diffusion for Precipitation Nowcasting
* DiffEditor: Boosting Accuracy and Flexibility on Diffusion-Based Image Editing
* Diffeomorphic Template Registration for Atmospheric Turbulence Mitigation
* Difference-Aware Distillation for Semantic Segmentation
* Differentiable Display Photometric Stereo
* Differentiable Information Bottleneck for Deterministic Multi-View Clustering
* Differentiable Micro-Mesh Construction
* Differentiable Neural Surface Refinement for Modeling Transparent Objects
* Differentiable Point-Based Inverse Rendering
* DiffForensics: Leveraging Diffusion Prior to Image Forgery Detection and Localization
* DiffHuman: Probabilistic Photorealistic 3D Reconstruction of Humans
* DiffInDScene: Diffusion-Based High-Quality 3D Indoor Scene Generation
* DiffLight: Integrating Content and Detail for Low-light Image Enhancement
* DiffLoc: Diffusion Model for Outdoor LiDAR Localization
* DifFlow3D: Toward Robust Uncertainty-Aware Scene Flow Estimation with Iterative Diffusion-Based Refinement
* DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing
* DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction
* DiffPerformer: Iterative Learning of Consistent Latent Guidance for Diffusion-Based Human Video Generation
* DiffPortrait3D: Controllable Diffusion for Zero-Shot Portrait View Synthesis
* DiffSal: Joint Audio and Video Learning for Diffusion Saliency Prediction
* DiffSCI: Zero-Shot Snapshot Compressive Imaging via Iterative Spectral Diffusion Model
* DiffSeg: Towards Detecting Diffusion-Based Inpainting Attacks Using Multi-Feature Segmentation
* DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-Driven Holistic 3D Expression and Gesture Generation
* DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures
* DiffuScene: Denoising Diffusion Models for Generative Indoor Scene Synthesis
* Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion
* Diffusemix: Label-Preserving Data Augmentation with Diffusion Models
* Diffusion 3D Features (Diff3F) Decorating Untextured Shapes with Distilled Semantic Features
* Diffusion Augmentation and Pose Generation Based Pre-Training Method for Robust Visible-Infrared Person Re-Identification
* Diffusion Handles Enabling 3D Edits for Diffusion Models by Lifting Activations to 3D
* Diffusion Model Alignment Using Direct Preference Optimization
* Diffusion Models Without Attention
* Diffusion Reflectance Map: Single-Image Stochastic Inverse Rendering of Illumination and Reflectance
* Diffusion Time-step Curriculum for One Image to 3D Generation
* Diffusion-Based Adaptation for Classification of Unknown Degraded Images
* Diffusion-Based Adversarial Purification for Speaker Verification
* Diffusion-based Blind Text Image Super-Resolution
* Diffusion-Driven GAN Inversion for Multi-Modal Face Image Generation
* Diffusion-EDFs: Bi-Equivariant Denoising Generative Modeling on SE(3) for Visual Robotic Manipulation
* Diffusion-ES: Gradient-Free Planning with Diffusion for Autonomous and Instruction-Guided Driving
* Diffusion-FOF: Single-View Clothed Human Reconstruction via Diffusion-Based Fourier Occupancy Field
* DiffusionAvatars: Deferred Diffusion for High-fidelity 3D Head Avatars
* DiffusionGAN3D: Boosting Text-guided 3D Generation and Domain Adaptation by Combining 3D GANs and Diffusion Priors
* DiffusionLight: Light Probes for Free by Painting a Chrome Ball
* DiffusionMTL: Learning Multi-Task Denoising Diffusion Model from Partially Annotated Data
* DiffusionPoser: Real-Time Human Motion Reconstruction From Arbitrary Sparse Sensors Using Autoregressive Diffusion
* DiffusionRegPose: Enhancing Multi-Person Pose Estimation Using a Diffusion-Based End-to-End Regression Approach
* DiffusionTrack: Point Set Diffusion Model for Visual Object Tracking
* DiG-IN: Diffusion Guidance for Investigating Networks: Uncovering Classifier Differences, Neuron Visualisations, and Visual Counterfactual Explanations
* Digital Life Project: Autonomous 3D Characters with Social Intelligence
* Digital Superresolution Method With Minimal Sensitivity to Shift Estimation Error, A
* Digital Twins for Research and Innovation in Support of the European Green Deal Data Space: A Systematic Review
* DiLiGenRT: A Photometric Stereo Dataset with Quantified Roughness and Translucency
* DIMAT: Decentralized Iterative Merging-and-Training for Deep Learning Models
* DIOD: Self-Distillation Meets Object Discovery
* DiPrompT: Disentangled Prompt Tuning for Multiple Latent Domain Generalization in Federated Learning
* DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data
* Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion
* Disco: Disentangled Control for Realistic Human Dance Generation
* Discontinuity-preserving Normal Integration with Auxiliary Edges
* Discover and Mitigate Multiple Biased Subgroups in Image Classifiers
* Discovering and Mitigating Visual Biases Through Keyword Explanation
* Discovering interpretable models of scientific image data with deep learning
* Discovering Syntactic Interaction Clues for Human-Object Interaction Detection
* Discovering the Ecosystem Service Value Growth Characteristics of a Subtropical Soil Erosion Area Using a Remote-Sensing-Driven Mountainous Equivalent Factor Method
* Discriminability-Driven Channel Selection for Out-of-Distribution Detection
* Discriminating between Biotic and Abiotic Stress in Poplar Forests Using Hyperspectral and LiDAR Data
* Discriminative Pattern Calibration Mechanism for Source-Free Domain Adaptation
* Discriminative Probing and Tuning for Text-to-Image Generation
* Discriminative Sample-Guided and Parameter-Efficient Feature Space Adaptation for Cross-Domain Few-Shot Learning
* Disentangled Explanations of Neural Network Predictions by Finding Relevant Subspaces
* Disentangled Pre-Training for Human-Object Interaction Detection
* Disentangled Prompt Representation for Domain Generalization
* Disentangled Representation Learning for Robust Radar Inter-Pulse Modulation Feature Extraction and Recognition
* Disentangled Sample Guidance Learning for Unsupervised Person Re-Identification
* Disentangling the Effects of Atmospheric and Soil Dryness on Autumn Phenology across the Northern Hemisphere
* Dispel Darkness for Better Fusion: A Controllable Visual Enhancer Based on Cross-Modal Conditional Adversarial Learning
* Dispersed Structured Light for Hyperspectral 3D Imaging
* DiSR-NeRF: Diffusion-Guided View-Consistent Super-Resolution NeRF
* Disrupting Anti-Spoofing Systems by Images of Consistent Identity
* Distilled Datamodel with Reverse Gradient Matching
* Distilling CLIP with Dual Guidance for Learning Discriminative Human Body Shape Representation
* Distilling ODE Solvers of Diffusion Models into Smaller Steps
* Distilling Semantic Priors from SAM to Efficient Image Restoration Models
* Distilling Vision-Language Models on Millions of Videos
* Distraction is All You Need: Memory-Efficient Image Immunization against Diffusion-Based Image Editing
* Distributed Deadlock-Free Task Offloading Algorithm for Integrated Communication-Sensing-Computing Satellites with Data-Dependent Constraints, A
* Distributed Fixed-Time Control for Leader-Steered Rigid Shape Formation With Prescribed Performance
* Distributed Memory Approximate Message Passing
* Distributed Online Ordinal Regression Based on VUS Maximization
* Distribution-Aware Knowledge Prototyping for Non-Exemplar Lifelong Person Re-Identification
* Distribution-Aware Multi-Label FixMatch for Semi-Supervised Learning on CheXpert
* Distribution-Level Multi-View Clustering for Unaligned Data
* Distributionally Generative Augmentation for Fair Facial Attribute Classification
* DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
* Dithering with Pascal Cellular Automaton
* DITTO: Dual and Integrated Latent Topologies for Implicit 3D Reconstruction
* DiVa-360: The Dynamic Visual Dataset for Immersive Neural Fields
* DiVAS: Video and Audio Synchronization with Dynamic Frame Rates
* DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data
* Diversified and Personalized Multi-Rater Medical Image Segmentation
* Diversity-Aware Channel Pruning for StyleGAN Compression
* Divide and Conquer Boosting for Enhanced Traffic Safety Description and Analysis with Large Vision Language Model
* Divide and Conquer: High-Resolution Industrial Anomaly Detection via Memory Efficient Tiled Ensemble
* DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision
* DLDiff: Image Detail-Guided Latent Diffusion Model for Low-Light Image Enhancement
* DLP-Fusion: Depth of Field, Light Source, and Polarization Fusion Toward Intelligent Optical Imaging for Complex Scenes
* DMAP: Decoupling-Driven Multi-Level Attribute Parsing for Interpretable Outfit Collocation
* DMR: Decomposed Multi-Modality Representations for Frames and Events Fusion in Visual Reinforcement Learning
* DMR: Disentangling Marginal Representations for Out-of-Distribution Detection
* DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization
* Do More With What You Have: Transferring Depth-Scale from Labeled to Unlabeled Domains
* Do Vision and Language Encoders Represent the World Similarly?
* Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval
* DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks
* Domain Adaptation for Satellite-Borne Multispectral Cloud Detection
* Domain Adaptation Using Pseudo Labels for COVID-19 Detection
* Domain adaptation, Explainability & Fairness in AI for Medical Image Analysis: Diagnosis of COVID-19 based on 3-D Chest CT-scans
* Domain Adaptation-Aware Transformer for Hyperspectral Object Tracking
* Domain Adaptive LiDAR Point Cloud Segmentation via Density-Aware Self-Training
* Domain Gap Embeddings for Generative Dataset Augmentation
* Domain Generalization for Crop Segmentation with Standardized Ensemble Knowledge Distillation
* Domain Prompt Learning with Quaternion Networks
* Domain Separation Graph Neural Networks for Saliency Object Ranking
* Domain Targeted Synthetic Plant Style Transfer using Stable Diffusion, LoRA and ControlNet
* Domain-Agnostic Mutual Prompting for Unsupervised Domain Adaptation
* Domain-Oriented Knowledge Transfer for Cross-Domain Recommendation
* Domain-Rectifying Adapter for Cross-Domain Few-Shot Segmentation
* Domain-Specific Block Selection and Paired-View Pseudo-Labeling for Online Test-Time Adaptation
* Don't Drop Your Samples! Coherence-Aware Training Benefits Conditional Diffusion
* Don't Look into the Dark: Latent Codes for Pluralistic Image Inpainting
* Doodle Your 3D: from Abstract Freehand Sketches to Precise 3D Shapes
* Doppler-Spread Space Target Detection Based on Overlapping Group Shrinkage and Order Statistics
* Doubly Abductive Counterfactual Inference for Text-Based Image Editing
* DPHANet: Discriminative Parallel and Hierarchical Attention Network for Natural Language Video Localization
* DPHMs: Diffusion Parametric Head Models for Depth-Based Tracking
* DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery
* DQ-HorizonNet: Enhancing Door Detection Accuracy in Panoramic Images via Dynamic Quantization
* Dr-SAM: An End-to-End Framework for Vascular Segmentation, Diameter Estimation, and Anomaly Detection on Angiography Images.
* Dr.Bokeh: DiffeRentiable Occlusion-Aware Bokeh Rendering
* Dr.Hair: Reconstructing Scalp-Connected Hair Strands without Pre-Training via Differentiable Rendering of Line Segments
* Dr2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning
* Drag Your Noise: Interactive Point-based Editing via Diffusion Semantic Propagation
* DragDiffusion: Harnessing Diffusion Models for Interactive Point-Based Image Editing
* Draw Step by Step: Reconstructing CAD Construction Sequences from Point Clouds via Multimodal Diffusion
* DRCT: Saving Image Super-Resolution away from Information Bottleneck
* Dream Video: Composing Your Dream Videos with Customized Subject and Motion
* DREAM: Diffusion Rectification and Estimation-Adaptive Models
* DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models
* DreamComposer: Controllable 3D Object Generation via Multi-View Conditions
* DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior
* DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization
* DreamPropeller: Supercharge Text-to-3D Generation with Parallel Sampling
* DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context in Editable Face Generation
* DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback
* Driver-Centric Data-Driven Model Predictive Vehicular Platoon With Longitudinal-Lateral Dynamics
* DriveTrack: A Benchmark for Long-Range Point Tracking in Real-World Videos
* DriveWorld: 4D Pre-Trained Scene Understanding via World Models for Autonomous Driving
* Driving Everywhere with Large Language Model Policy Adaptation
* Driving Into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving
* Driving-Video Dehazing with Non-Aligned Regularization for Safety Assistance
* DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes
* Drone-HAT: Hybrid Attention Transformer for Complex Action Recognition in Drone Surveillance Videos
* DS-NeRV: Implicit Neural Video Representation with Decomposed Static and Dynamic Codes
* DSANet-KD: Dual Semantic Approximation Network via Knowledge Distillation for Rail Surface Defect Detection
* DSGG: Dense Relation Transformer for an End-to-End Scene Graph Generation
* DSIS-DPR: Structured Instance Segmentation and Diffusion Prior Refinement for Dental Anatomy Learning
* DSL-FIQA: Assessing Facial Image Quality via Dual-Set Degradation Learning and Landmark-Guided Transformer
* DSTCFuse: A Method based on Dual-cycled Cross-awareness of Structure Tensor for Semantic Segmentation via Infrared and Visible Image Fusion
* DTLLM-VLT: Diverse Text Generation for Visual Language Tracking Based on LLM
* Dual Consensus Anchor Learning for Fast Multi-View Clustering
* Dual DETRs for Multi-Label Temporal Action Detection
* Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models
* Dual Pose-invariant Embeddings: Learning Category and Object-specific Discriminative Representations for Recognition and Retrieval
* Dual Prior Unfolding for Snapshot Compressive Imaging
* Dual Prototype Attention for Unsupervised Video Object Segmentation
* Dual Self-Paced Hashing for Image Retrieval
* Dual-Augmentor Framework for Domain Generalization in 3D Human Pose Estimation, A
* Dual-Consistency Model Inversion for Non-Exemplar Class Incremental Learning
* Dual-Domain Fusion Network Based on Wavelet Frequency Decomposition and Fuzzy Spatial Constraint for Remote Sensing Image Segmentation
* Dual-Enhanced Coreset Selection with Class-Wise Collaboration for Online Blurry Class Incremental Learning
* Dual-Mode Approach for Vision-Based Navigation in a Lunar Landing Scenario, A
* Dual-Scale Transformer for Large-Scale Single-Pixel Imaging
* Dual-Signature Blockchain-Based Key Sharing Protocol for Secure V2V Communications in Multi-Domain IoV Environments
* Dual-Task Mutual Learning With QPHFM Watermarking for Deepfake Detection
* Dual-View Visual Contextualization for Web Navigation
* Dual-Wavelength Interferometric Detection Technology for Wind and Temperature Fields in the Martian Middle and Upper Atmosphere Based on LCTF
* Dualad: Disentangling the Dynamic and Static World for End-to-End Driving
* DualGroup for 3D instance and panoptic segmentation
* DUDF: Differentiable Unsigned Distance Fields with Hyperbolic Scaling
* DuPL: Dual Student with Trustworthy Progressive Learning for Robust Weakly Supervised Semantic Segmentation
* DUSt3R: Geometric 3D Vision Made Easy
* DuST: Dual Swin Transformer for Multi-modal Video and Time-Series Modeling
* DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses
* DVMSR: Distillated Vision Mamba for Efficient Super-Resolution
* DyBluRF: Dynamic Neural Radiance Fields from Blurry Monocular Video
* Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis
* Dynamic Addition of Noise in a Diffusion Model for Anomaly Detection
* Dynamic Analysis and Risk Assessment of Vegetation Net Primary Productivity in Xinjiang, China
* Dynamic Cues-Assisted Transformer for Robust Point Cloud Registration
* Dynamic Distinction Learning: Adaptive Pseudo Anomalies for Video Anomaly Detection
* Dynamic Double Event-Triggered Anti-Disturbance Tracking Control for a 2-DOF Small Unmanned Helicopter
* Dynamic Ensemble Teacher-Student Distillation Framework for Light-Weight Fake Audio Detection
* Dynamic Graph Representation with Knowledge-Aware Attention for Histopathology Whole Slide Image Analysis
* Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for Enhanced Human Pose Estimation with Sparse Inertial Sensors
* Dynamic Kernel Prior Model for Unsupervised Blind Image Super-Resolution, A
* Dynamic Knowledge Adapter with Probabilistic Calibration for Generalized Few-Shot Semantic Segmentation
* Dynamic Learnable Logit Adjustment for Long-Tailed Visual Recognition
* Dynamic LiDAR Re-Simulation Using Compositional Neural Fields
* Dynamic Monitoring and Analysis of Ecological Environment Quality in Arid and Semi-Arid Areas Based on a Modified Remote Sensing Ecological Index (MRSEI): A Case Study of the Qilian Mountain National Nature Reserve
* Dynamic Passenger Route Guidance in the Multimodal Transit System with Graph Representation and Attention Based Deep Reinforcement Learning
* Dynamic Policy-Driven Adaptive Multi-Instance Learning for Whole Slide Image Classification
* Dynamic Prompt Optimizing for Text-to-Image Generation
* Dynamic Rescheduling for Optimal Transportation Incident Management in Inland Waterways
* Dynamic Support Information Mining for Category-Agnostic Pose Estimation
* Dynamic Voxels Based on Ego-Conditioned Prediction: An Integrated Spatio-Temporal Framework for Motion Planning
* DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing
* Dysen-VDM: Empowering Dynamics-Aware Text-to-Video Diffusion with LLMs
* DYSON: Dynamic Feature Space Self-Organization for Online Task-Free Class Incremental Learning
* E-GPS: Explainable Geometry Problem Solving via Top-Down Solver and Bottom-Up Generator
* E3: Ensemble of Expert Embedders for Adapting Synthetic Image Detectors to New Generators Using Limited Data
* Each Performs Its Functions: Task Decomposition and Feature Assignment for Audio-Visual Segmentation
* Each Test Image Deserves A Specific Prompt: Continual Test-Time Adaptation for 2D Medical Image Segmentation
* EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation
* Early Mission Calibration Performance of NOAA-21 VIIRS Reflective Solar Bands
* EarthLoc: Astronaut Photography Localization by Indexing Earth from Space
* EarthMatch: Iterative Coregistration for Fine-grained Localization of Astronaut Photography
* EASE-DETR: Easing the Competition among Object Queries
* EasyDrag: Efficient Point-Based Manipulation on Diffusion Models
* EBDNet: Integrating Optical Flow With Kernel Prediction for Burst Denoising
* ECEA: Extensible Co-Existing Attention for Few-Shot Object Detection
* ECLAIR: A High-Fidelity Aerial LiDAR Dataset for Semantic Segmentation
* ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations
* Eclipse: Disambiguating Illumination and Materials Using Unintended Shadows
* ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt Tuning
* ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation
* ED-DCFNet: an unsupervised encoder-decoder neural model for event-driven feature extraction and object tracking
* Edge-and-Mask Integration-Driven Diffusion Models for Medical Image Segmentation
* Edge-Aware 3D Instance Segmentation Network with Intelligent Semantic Prior
* EdgeRelight360: Text-Conditioned 360-Degree HDR Image Generation for Real-Time On-Device Video Portrait Relighting
* Edit Friendly DDPM Noise Space: Inversion and Manipulations, An
* Edit One for All: Interactive Batch Image Editing
* Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents
* EditGuard: Versatile Image Watermarking for Tamper Localization and Copyright Protection
* Editorial for the Special Issue Aerosol and Atmospheric Correction, An
* Effective and Robust Adversarial Training Against Data and Label Corruptions
* Effective Ensemble Learning Framework for Affective Behaviour Analysis, An
* Effective Method for Detecting Violation of Helmet Rule for Motorcyclists, An
* Effective Video Mirror Detection with Inconsistent Motion Cues
* Effects of Extreme Climatic Events on the Autumn Phenology in Northern China Are Related to Vegetation Types and Background Climates
* Efficient 3D Implicit Head Avatar With Mesh-Anchored Hash Table Blendshapes
* Efficient Adaptive Large Neighborhood Search for Sensor-Weapon-Target Assignment
* Efficient Algorithm for Extracting Railway Tracks Based on Spatial-Channel Graph Convolutional Network and Deep Neural Residual Network, An
* Efficient Alternative Route Planning in Road Networks
* Efficient and Effective Weakly-Supervised Action Segmentation via Action-Transition-Aware Boundary Alignment
* Efficient and Expressive Fully Policy-Hidden Ciphertext-Policy Attribute-Based Encryption Scheme for Satellite Service Systems, An
* Efficient Complex Immittance Spectral Frequency With the Perceptual-Metric-Based Codebook Search
* Efficient Convex-Hull Relaxation Based Algorithm for Multi-User Discrete Passive Beamforming, An
* Efficient Dataset Distillation via Minimax Diffusion
* Efficient Deconvolution With the Discrete Fourier Transform
* Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications
* Efficient Detection of Long Consistent Cycles and its Application to Distributed Synchronization
* Efficient Exploration of Image Classifier Failures with Bayesian Optimization and Text-to-Image Models
* Efficient Feature Extraction and Late Fusion Strategy for Audiovisual Emotional Mimicry Intensity Estimation
* Efficient High-Quality Vectorized Modeling of Large-Scale Scenes
* Efficient Hybrid Feature Interaction Network for Stereo Image Super-Resolution
* Efficient Hyperparameter Optimization with Adaptive Fidelity Identification
* Efficient Image Privacy Preservation Scheme for Smart City Applications Using Compressive Sensing and Multi-Level Encryption, An
* Efficient Light Field Image Super-Resolution via Progressive Disentangling
* Efficient local correlation volume for unsupervised optical flow estimation on small moving objects in large satellite images
* Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed
* Efficient LWPooling: Rethinking the Wavelet Pooling for Scene Parsing
* Efficient Matching Game Approach to Association Formation in UAV-Enabled Hierarchical Distributed Learning, An
* Efficient Meshflow and Optical Flow Estimation from Event Cameras
* Efficient Model Stealing Defense with Noise Transition Matrix
* Efficient Multi-Scale Network with Learnable Discrete Wavelet Transform for Blind Motion Deblurring
* Efficient Multitask Dense Predictor via Binarization
* Efficient On-Board Compression for Arbitrary-Shaped Cloud-Covered Remote Sensing Images via Adaptive Filling and Controllable Quantization
* Efficient Online Multi-Camera Tracking with Memory-Efficient Accumulated Appearance Features and Trajectory Validation
* Efficient Privacy-Preserving Visual Localization Using 3D Ray Clouds
* Efficient Scene Recovery Using Luminous Flux Prior
* Efficient Skeleton-Based Action Recognition for Real-Time Embedded Systems
* Efficient Solution of Point-Line Absolute Pose
* Efficient Stitchable Task Adaptation
* Efficient Test-Time Adaptation of Vision-Language Models
* Efficient Transformer Adaptation with Soft Token Merging
* Efficient Uncertainty-Aware Collision Avoidance for Autonomous Driving Using Convolutions
* Efficient Video Stabilization via Partial Block Phase Correlation on Edge GPUs
* Efficient Vision-Language Pre-Training by Cluster Masking
* EfficientDreamer: High-Fidelity and Stable 3D Creation via Orthogonal-view Diffusion Priors
* Efficiently Assemble Normalization Layers and Regularization for Federated Domain Generalization
* EfficientNet-SAM: A Novel EffecientNet with Spatial Attention Mechanism for COVID-19 Detection in Pulmonary CT Scans
* EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything
* EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
* Efflex: Efficient and Flexible Pipeline for Spatio-Temporal Trajectory Graph Modeling and Representation Learning
* EFHQ: Multi-Purpose ExtremePose-Face-HQ Dataset
* EFormer: Enhanced Transformer Towards Semantic-Contour Features of Foreground for Portraits Matting
* Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
* Egocentric Vulnerable Road Users Trajectory Prediction With Incomplete Observation
* Egocentric Whole-Body Motion Capture with FisheyeViT and Diffusion-Based Motion Refinement
* EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World
* EgoGen: An Egocentric Synthetic Data Generator
* EgoSG: Learning 3D Scene Graphs from Egocentric RGB-D Sequences
* EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models
* EGTR: Extracting Graph from Transformer for Scene Graph Generation
* EHIR: Energy-based Hierarchical Iterative Image Registration for Accurate PCB Defect Detection
* EL2NM: Extremely Low-light Noise Modeling Through Diffusion Iteration
* ElasticDiffusion: Training-Free Arbitrary Size Image Generation Through Global-Local Content Separation
* Electric Vehicle Next Charge Location Prediction
* Elite360D: Towards Efficient 360 Depth Estimation via Semantic- and Distance-Aware Bi-Projection Fusion
* ELSA: Exploiting Layer-wise N:M Sparsity for Vision Transformer Acceleration
* EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling
* Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld
* EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
* Embracing Unimodal Aleatoric Uncertainty for Robust Multimodal Fusion
* EMCAD: Efficient Multi-Scale Convolutional Attention Decoding for Medical Image Segmentation
* Emergent Open-Vocabulary Semantic Segmentation from Off-the-Shelf Vision-Language Models
* EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models
* EMOPortraits: Emotion-Enhanced Multimodal One-Shot Head Avatars
* Emotic Masked Autoencoder on Dual-views with Attention Fusion for Facial Expression Recognition
* EmotiEffNet and Temporal Convolutional Networks in Video-based Facial Expression Recognition and Action Unit Detection
* Emotion Recognition of Playing Musicians From EEG, ECG, and Acoustic Signals
* Emotion Recognition Using Transformers with Random Masking
* Emotional Speech-Driven 3D Body Animation via Disentangled Latent Diffusion
* EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning
* Empirical Study of Scaling Law for Scene Text Recognition, An
* Empirical Study of the Generalization Ability of Lidar 3D Object Detectors to Unseen Domains, An
* Empirical Study on Multi-domain Robust Semantic Segmentation, An
* Empowering Real-World Image Super-Resolution With Flexible Interactive Modulation
* Empowering Resampling Operation for Ultra-High-Definition Image Enhancement with Model-Aware Guidance
* Empty Room is All We Want: Automatic Defurnishing of Indoor Panoramas, An
* Emu Edit: Precise Image Editing via Recognition and Generation Tasks
* En3D: An Enhanced Generative Model for Sculpting 3D Humans from 2D Synthetic Data
* End-to-End Approach for Handwriting Recognition: From Handwritten Text Lines to Complete Pages, An
* End-to-End Deep Learning Models for Gap Identification in Maize Fields
* End-to-End Neural Network Compression via l1/l2 Regularized Latency Surrogates
* End-to-end Solution for Tenebrio Molitor Rearing Monitoring with Uncertainty Estimation and Domain Shift Detection
* End-to-End Spatio-Temporal Action Localisation with Video Transformers
* End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames
* End-to-End Vision Transformer Approach for Image Copy Detection, An
* Endow SAM with Keen Eyes: Temporal-Spatial Prompt Learning for Video Camouflaged Object Detection
* Energy-Efficient Cooperative Secure Communications in mmWave Vehicular Networks Using Deep Recurrent Reinforcement Learning
* Energy-Efficient Uncertainty-Aware Biomass Composition Prediction at the Edge
* Enforcing Conditional Independence for Fair Representation Learning and Causal Image Generation
* Enforcing Temporal Consistency for Color Constancy in Video Sequences
* Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model
* Enhance Sample Efficiency and Robustness of End-to-End Urban Autonomous Driving via Semantic Masked World Model
* Enhanced Blue Band Vegetation Index (The Re-Modified Anthocyanin Reflectance Index (RMARI)) for Accurate Farmland Shelterbelt Extraction
* Enhanced Data Transfer Cooperating with Artificial Triplets for Scene Graph Generation
* Enhanced Dynamic Analysis for Malware Detection With Gradient Attack
* Enhanced Impacts of Extreme Weather Events on Forest: The Upper Valtellina (Italy) Case Study
* Enhanced Lung Cancer Diagnosis and Staging With HRNeT: A Deep Learning Approach
* Enhanced Monitoring of Sub-Seasonal Land Use Dynamics in Vietnam's Mekong Delta through Quantile Mapping and Harmonic Regression
* Enhanced Motion-Text Alignment for Image-to-Video Transfer Learning
* Enhanced Prototypical Network with Customized Region-Aware Convolution for Few-Shot SAR ATR
* Enhanced Scene Understanding and Situation Awareness for Autonomous Vehicles Based on Semantic Segmentation
* Enhancing 2D Representation Learning with a 3D Prior
* Enhancing 3D Fidelity of Text-to-3D using Cross-View Correspondences
* Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors
* Enhancing Accuracy in Historical Forest Vegetation Mapping in Yunnan with Phenological Features, and Climatic and Elevation Variables
* Enhancing Alfalfa Biomass Prediction: An Innovative Framework Using Remote Sensing Data
* Enhancing Digital Twins with Human Movement Data: A Comparative Study of Lidar-Based Tracking Methods
* Enhancing Emotion Recognition with Pre-trained Masked Autoencoders and Sequential Learning
* Enhancing Image Classification Robustness through Adversarial Sampling with Delta Data Augmentation (DDA)
* Enhancing Inter-Class Separability With High-Order Strangers for Multi-View Clustering
* Enhancing Intrinsic Features for Debiasing via Investigating Class-Discerning Common Attributes in Bias-Contrastive Pair
* Enhancing Ki-67 Cell Segmentation with Dual U-Net Models: A Step Towards Uncertainty-Informed Active Learning
* Enhancing low-light images via dehazing principles: Essence and method
* Enhancing Multi-Label Deep Hashing for Image and Audio With Joint Internal Global Loss Constraints and Large Vision-Language Model
* Enhancing Multimodal Cooperation via Sample-Level Modality Valuation
* Enhancing Planning for Autonomous Driving via an Iterative Optimization Framework Incorporating Safety-Critical Trajectory Generation
* Enhancing Post-Training Quantization Calibration Through Contrastive Learning
* Enhancing Quality of Compressed Images by Mitigating Enhancement Bias Towards Compression Domain
* Enhancing Road Object Detection in Fisheye Cameras: An Effective Framework Integrating SAHI and Hybrid Inference
* Enhancing Robustness of Deep Reinforcement Learning Based Adaptive Traffic Signal Controllers in Mixed Traffic Environments Through Data Fusion and Multi-Discrete Actions
* Enhancing Significant Wave Height Retrieval with FY-3E GNSS-R Data: A Comparative Analysis of Deep Learning Models
* Enhancing stock market prediction through image encoding, pattern recognition, and ensemble learning with custom error correction techniques
* Enhancing Targeted Attack Transferability via Diversified Weight Pruning
* Enhancing the Power of OOD Detection via Sample-Aware Model Selection
* Enhancing the Transferability of Adversarial Attacks with Stealth Preservation
* Enhancing Timeliness in Asynchronous Vehicle Localization: A Signal-Multiplexing Network Measuring Approach
* Enhancing Traffic Safety with Parallel Dense Video Captioning for End-to-End Event Analysis
* Enhancing Unsupervised Semantic Segmentation Through Context-Aware Clustering
* Enhancing Video Super-Resolution via Implicit Resampling-based Alignment
* Enhancing Vision-Language Pre-Training with Rich Supervisions
* Enhancing Visual Continual Learning with Language-Guided Supervision
* Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models
* Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts
* Enhancing zero-shot object detection with external knowledge-guided robust contrast learning
* Enrich, Distill and Fuse: Generalized Few-Shot Semantic Segmentation in Remote Sensing Leveraging Foundation Model's Assistance
* Ensemble Diversity Facilitates Adversarial Transferability
* Ensemble Learning Approach With Attention Mechanism for Detecting Pavement Distress and Disaster-Induced Road Damage, An
* Ensemble Predictors: Possibilistic Combination of Conformal Predictors for Multivariate Time Series Classification
* Entangled View-Epipolar Information Aggregation for Generalizable Neural Radiance Fields
* Entity-NeRF: Detecting and Removing Moving Entities in Urban Scenes
* Entropy-Based Feature Extraction Model for Fundus Images with Deep Learning Model
* EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion
* Epistemic Uncertainty Quantification for Pretrained Neural Networks
* EPQ-GAN: Evolutionary Perceptual Quality Assessment Generative Adversarial Network for Image Dehazing
* Equipping Diffusion Models with Differentiable Spatial Entropy for Low-Light Image Enhancement
* Equivariant Multi-Modality Image Fusion
* Equivariant Plug-and-Play Image Reconstruction
* ERMVP: Communication-Efficient and Collaboration-Robust Multi-Vehicle Perception in Challenging Environments
* Error Analysis of Non-Time-Synchronized Lightning Positioning Method
* Error Detection in Egocentric Procedural Task Videos
* Error Model and Concise Temporal Network for Indirect Illumination in 3D Reconstruction
* ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
* ESCAPE: Encoding Super-keypoints for Category-Agnostic Pose Estimation
* EscherNet: A Generative Model for Scalable View Synthesis
* ESR-NeRF: Emissive Source Reconstruction Using LDR Multi-View Images
* Establishment of Remote Sensing Inversion Model and Its Application in Pollution Source Identification: A Case Study of East Lake in Wuhan
* Estimating Channels With Hundreds of Sub-Paths for MU-MIMO Uplink: A Structured High-Rank Tensor Approach
* Estimating Chlorophyll-a and Phycocyanin Concentrations in Inland Temperate Lakes across New York State Using Sentinel-2 Images: Application of Google Earth Engine for Efficient Satellite Image Processing
* Estimating Depth of Monocular Panoramic Image with Teacher-Student Model Fusing Equirectangular and Spherical Representations
* Estimating Extreme 3D Image Rotations using Cascaded Attention
* Estimating Global Gross Primary Production Using an Improved MODIS Leaf Area Index Dataset
* Estimating Noisy Class Posterior with Part-level Labels for Noisy Label Learning
* Estimating Perceived Mental Workload From Eye-Tracking Data Based on Benign Anisocoria
* Estimating Rootzone Soil Moisture by Fusing Multiple Remote Sensing Products with Machine Learning
* Estimating the Semantics via Sector Embedding for Image-Text Retrieval
* Estimating Three-Dimensional Resistivity Distribution with Magnetotelluric Data and a Deep Learning Algorithm
* Estimating Vertical Distribution of Total Suspended Matter in Coastal Waters Using Remote-Sensing Approaches
* Estimation Model and Spatio-Temporal Analysis of Carbon Emissions from Energy Consumption with NPP-VIIRS-like Nighttime Light Images: A Case Study in the Pearl River Delta Urban Agglomeration of China
* Estimation of Aerosol Characteristics from Broadband Solar Radiation Measurements Carried Out in Southern Algeria
* Estimation of Forage Biomass in Oat (Avena sativa) Using Agronomic Variables through UAV Multispectral Imaging
* Estimation of IFOV Inter-Channel Deviation for Microwave Radiation Imager Onboard FY-3G Satellite
* Estimation of Soil Salinity by Combining Spectral and Texture Information from UAV Multispectral Images in the Tarim River Basin, China
* Estimation of Vehicular Journey Time Variability by Bayesian Data Fusion With General Mixture Model
* Estimation, Spatiotemporal Dynamics, and Driving Factors of Grassland Biomass Carbon Storage Based on Machine Learning Methods: A Case Study of the Hulunbuir Grassland
* eTraM: Event-Based Traffic Monitoring Dataset
* EvalCrafter: Benchmarking and Evaluating Large Video Generation Models
* Evaluating and Improving Compositional Text-to-Visual Generation
* Evaluating Autonomous Vehicle Safety Performance Through Analysis of Pre-Crash Trajectories of Powered Two-Wheelers
* Evaluating Confidence Calibration in Endoscopic Diagnosis Models
* Evaluating Ecological Drought Vulnerability from Ecosystem Service Value Perspectives in North China
* Evaluating Flood Damage to Paddy Rice Fields Using PlanetScope and Sentinel-1 Data in North-Western Nigeria: Towards Potential Climate Adaptation Strategies
* Evaluating Multimodal Large Language Models across Distribution Shifts and Augmentations
* Evaluating Pedestrian Trajectory Prediction Methods With Respect to Autonomous Driving
* Evaluating Satellite-Based Water Quality Sensing of Inland Waters on Basis of 100+ German Water Bodies Using 2 Different Processing Chains
* Evaluating the Effectiveness of Video Anomaly Detection in the Wild Online Learning and Inference for Real-world Deployment
* Evaluating the Integration of Morph Attack Detection in Automated Face Recognition Systems
* Evaluating the Prediction Performance of the WRF-CUACE Model in Xinjiang, China
* Evaluating Transferability in Retrieval Tasks: An Approach Using MMD and Kernel Methods
* Evaluation of Ecological Environment Quality Using an Improved Remote Sensing Ecological Index Model
* Evaluation of Ecosystem Quality and Its Response to Aridity on the Qinghai-Tibet Plateau, An
* Evaluation of Future Climate Change Impacts on Key Elements of the Water-Carbon Cycle Using a Physics-Based Ecohydrological Model in Sanchuan River Basin, Loess Plateau, An
* Evaluation of Reanalysis and Satellite Products against Ground-Based Observations in a Desert Environment
* Evaluation of the Monitoring Capabilities of Remote Sensing Satellites for Maritime Moving Targets
* Evaluation of the Operational Global Ocean Wave Forecasting System of China
* Evaluation of the Surface Downward Longwave Radiation Estimation Models over Land Surface
* Evcap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension
* EvDiG: Event-guided Direct and Global Components Separation
* Event Camera Demosaicing via Swin Transformer and Pixel-focus Loss
* Event Stream-Based Visual Object Tracking: A High-Resolution Benchmark Dataset and A Novel Baseline
* Event-Assisted Blurriness Representation Learning for Blurry Image Unfolding
* Event-Assisted Low-Light Video Object Segmentation
* Event-based Ball Spin Estimation in Sports
* Event-Based Eye Tracking. AIS 2024 Challenge Survey
* Event-Based Optical Flow via Transforming Into Motion-Dependent View
* Event-based Structure-from-Orbit
* Event-Based Visible and Infrared Fusion via Multi-Task Collaboration
* Event-Triggered Heading Control of an Energy-Efficient Underwater Gliding Robot
* EventDance: Unsupervised Source-Free Cross-Modal Adaptation for Event-Based Object Recognition
* EventEgo3D: 3D Human Motion Capture from Egocentric Event Streams
* EventPS: Real-Time Photometric Stereo Using an Event Camera
* Evidential Active Recognition: Intelligent and Prudent Open-World Embodied Perception
* EVS-Assisted Joint Deblurring, Rolling-Shutter Correction and Video Frame Interpolation Through Sensor Inverse Modeling
* Exact Fusion via Feature Distribution Matching for Few-Shot Image Generation
* Exact Obstacle Avoidance for Autonomous Vehicles in Polygonal Domains
* EXACT: How to train your accuracy
* ExACT: Language-Guided Conceptual Reasoning and Uncertainty Estimation for Event-Based Action Recognition and More
* Examining the Impact of Topography and Vegetation on Existing Forest Canopy Height Products from ICESat-2 ATLAS/GEDI Data
* ExerAIde: AI-assisted Multimodal Diagnosis for Enhanced Sports Performance and Personalised Rehabilitation
* ExMap: Leveraging Explainability Heatmaps for Unsupervised Group Robustness to Spurious Correlations
* Expandable Subspace Ensemble for Pre-Trained Model-Based Class-Incremental Learning
* Expanding Scope of the Stability Gap: Unveiling its Presence in Joint Incremental Learning of Homogeneous Tasks, The
* Experimental and simulation study of a four-degree of freedom robot arm moving through space planner path
* Explaining CLIP's Performance Disparities on Data from Blind/Low Vision Users
* Explaining the Implicit Neural Canvas: Connecting Pixels to Neurons by Tracing Their Contributions
* Exploiting Bidirectional Quality Impulse for Reference Picture Resampled Gaming Video Coding
* Exploiting CLIP Self-Consistency to Automate Image Augmentation for Safety Critical Scenarios
* Exploiting Diffusion Prior for Generalizable Dense Prediction
* Exploiting Generative Diffusion Prior With Latent Low-Rank Regularization for Image Inpainting
* Exploiting Inter-sample and Inter-feature Relations in Dataset Distillation
* Exploiting Milano Retinex Contrast to Enhance Images with Strong Changes of Light Intensity
* Exploiting Style Latent Flows for Generalizing Deepfake Video Detection
* Exploration of Data Augmentation Techniques for Bush Detection in Blueberry Orchards
* Exploration of Deep-Learning-Based Error-Correction Methods for Meteorological Remote-Sensing Data: A Case Study of Atmospheric Motion Vectors
* Exploration of the Urbanization Process and Its Impact on Vegetation in 125 Resource-Based Cities in China and Comparison with Other Cities
* Exploring AI-Based Satellite Pose Estimation: from Novel Synthetic Dataset to In-Depth Performance Evaluation
* Exploring AIGC Video Quality: A Focus on Visual Harmony, Video-Text Consistency and Domain Distribution Gap
* Exploring Efficient Asymmetric Blind-Spots for Self-Supervised Denoising in Real-World Scenarios
* Exploring Facial Expression Recognition through Semi-Supervised Pre-training and Temporal Modeling
* Exploring Orthogonality in Open World Object Detection
* Exploring Pose-Aware Human-Object Interaction via Hybrid Learning
* Exploring Real World Map Change Generalization of Prior-Informed HD Map Prediction Models
* Exploring Region-Word Alignment in Built-in Detector for Open-Vocabulary Object Detection
* Exploring Regional Clues in CLIP for Zero-Shot Semantic Segmentation
* Exploring Robust Features for Few-Shot Object Detection in Satellite Imagery
* Exploring Text-to-Motion Generation with Human Preference
* Exploring the Benefits of Vision Foundation Models for Unsupervised Domain Adaptation
* Exploring the Impact of Dataset Bias on Dataset Distillation
* Exploring the Limits: Applying State-of-the-Art Stereo Matching Algorithms to Rectified Ultra-Wide Stereo
* Exploring the Potential of Large Foundation Models for Open-Vocabulary HOI Detection
* Exploring the Role of Audio in Video Captioning
* Exploring the Spectral Prior for Hyperspectral Image Super-Resolution
* Exploring the Transferability of Visual Prompting for Multimodal Large Language Models
* Exploring the usage of diffusion models for thermal image super-resolution: a generic, uncertainty-aware approach for guided and non-guided schemes
* Exploring the Usage of Pre-trained Features for Stereo Matching
* Exploring the Zero-Shot Capabilities of Vision-Language Models for Improving Gaze Following
* Exploring Vision Transformers for 3D Human Motion-Language Models with Motion Patches
* ExtDM: Distribution Extrapolation Diffusion Model for Video Prediction
* Extend Your Own Correspondences: Unsupervised Distant Point Cloud Registration by Progressive Distance Extension
* Extended Wien and Planck Loci
* Extending global-local view alignment for self-supervised learning with remote sensing imagery
* Extraction of Alteration Information from Hyperspectral Data Base on Kernel Extreme Learning Machine
* ExtraNeRF: Visibility-Aware View Extrapolation of Neural Radiance Fields with Diffusion Models
* Extreme Point Supervised Instance Segmentation
* Eyes of a Hawk and Ears of a Fox: Part Prototype Network for Generalized Zero-Shot Learning
* Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
* F3Loc: Fusion and Filtering for Floorplan Localization
* Face Forgery Detection via Multi-Feature Fusion and Local Enhancement
* Face2Diffusion for Fast and Editable Face Personalization
* FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio
* FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-Shot Subject-Driven Generation
* FaceCom: Towards High-fidelity 3D Facial Shape Completion via Optimization and Inpainting Guidance
* FaceLift: Semi-Supervised 3D Facial Landmark Localization
* Faces that Speak: Jointly Synthesising Talking Face and Speech from Text
* FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models
* Facial Action Unit Representation Based on Self-Supervised Learning With Ensembled Priori Constraints
* Facial Identity Anonymization via Intrinsic and Extrinsic Attention Distraction
* FACT: Frame-Action Cross-Attention Temporal Modeling for Efficient Action Segmentation
* FADES: Fair Disentanglement with Sensitive Relevance
* Fair Attention Network for Robust Visual Question Answering
* Fair Federated Learning Under Domain Skew with Local Consistency and Domain Diversity
* Fair-VPT: Fair Visual Prompt Tuning for Image Classification
* FairCLIP: Harnessing Fairness in Vision-Language Learning
* FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication
* FairRAG: Fair Human Generation via Fair Retrieval Augmentation
* FairSSD: Understanding Bias in Synthetic Speech Detectors
* Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis
* Fake it to make it: Using synthetic data to remedy the data shortage in joint multi-modal speech-and-gesture synthesis
* FakeInversion: Learning to Detect Images from Unseen Text-to-Image Models by Inverting Stable Diffusion
* Fall detection algorithm based on global and local feature extraction
* Fantastic Animals and Where to Find Them: Segment Any Marine Animal with Dual SAM
* FAPNet: An Effective Frequency Adaptive Point-based Eye Tracker
* FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation
* Fast Adaptation for Human Pose Estimation via Meta-Optimization
* Fast and Effective: Progressive Hierarchical Fusion Classification for Remote Sensing Images
* Fast and Robust LiDAR-Inertial Odometry by Tightly-Coupled Iterated Kalman Smoother and Robocentric Voxels
* Fast and Robust Range Alignment Method for ISAR Imaging Based on a Deep Learning Network and Regional Multi-Scale Minimum Entropy Method, A
* Fast Building Instance Proxy Reconstruction for Large Urban Scenes
* Fast Computing Model for the Oxygen A-Band High-Spectral-Resolution Absorption Spectra Based on Artificial Neural Networks, A
* Fast Exclusion Candidate Identification Based on Sparse Estimation for ARAIM Fault Exclusion Process
* Fast H.266/VVC Intra Coding by Early Skipping Joint Coding of Chroma Residuals
* Fast ODE-based Sampling for Diffusion Models in Around 5 Steps
* Fast Sequential Similarity Detection Algorithm for Multi-Source Image Matching, A
* Fast-NTK: Parameter-Efficient Unlearning for Large-Scale Models
* Faster Than Lies: Real-time Deepfake Detection using Binary Neural Networks
* FastMAC: Stochastic Spectral Sampling of Correspondence Graph
* Fault Detection for Switched Positive Systems With Application to Traffic Signal Systems
* FC-GNN: Recovering Reliable and Accurate Correspondences from Interferences
* FCS: Feature Calibration and Separation for Non-Exemplar Class Incremental Learning
* FD-Net: A Single-Stage Fire Detection Framework for Remote Sensing in Complex Environments
* FE-Det: An Effective Traffic Object Detection Framework for Fish-Eye Cameras
* Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields
* Feature Corrective Transfer Learning: End-to-End Solutions to Object Detection in Non-Ideal Visual Conditions
* Feature decomposition-based gaze estimation with auxiliary head pose regression
* Feature Fusion-Based Data Augmentation Method for Small Object Detection
* Feature Pyramid Network Based Spatial Attention and Cross-Level Semantic Similarity for Diseases Segmentation From Capsule Endoscopy Images
* Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology
* Feature Reconstruction With Disruption for Unsupervised Video Anomaly Detection
* Feature-consistent coplane-pair correspondence- and fusion-based point cloud registration
* Feature-Selection-Based Unsupervised Transfer Learning for Change Detection from VHR Optical Images
* FedAS: Bridging Inconsistency in Personalized Federated Learning
* Federated Generalized Category Discovery
* Federated Hyperparameter Optimization through Reward-Based Strategies: Challenges and Insights
* Federated Learning with a Single Shared Image
* Federated Online Adaptation for Deep Stereo
* Fedhca2: Towards Hetero-Client Federated Multi-Task Learning
* FedMef: Towards Memory-Efficient Federated Dynamic Pruning
* FedProK: Trustworthy Federated Class-Incremental Learning via Prototypical Feature Knowledge Transfer
* FedSelect: Personalized Federated Learning with Customized Selection of Parameters for Fine-Tuning
* FedSOL: Stabilized Orthogonal Learning with Proximal Restrictions in Federated Learning
* FedUV: Uniformity and Variance for Heterogeneous Federated Learning
* Feedback-Guided Autonomous Driving
* Fetal ECG Extraction on Time-Frequency Domain using Conditional GAN
* Few-Shot Action Recognition via Multi-View Representation Learning
* Few-Shot Fine-Grained Image Classification via Multi-Frequency Neighborhood and Double-Cross Modulation
* Few-Shot Font Generation by Learning Style Difference and Similarity
* Few-Shot Learner Parameterization by Diffusion Time-Steps
* Few-Shot Object Detection for Remote Sensing Imagery Using Segmentation Assistance and Triplet Head
* Few-Shot Object Detection with Foundation Models
* FFF: Fixing Flawed Foundations in contrastive pre-training results in very strong Vision-Language models
* FICNet: An End to End Network for Free-View Image Coding
* Fiducial Reference Measurement for Greenhouse Gases (FRM4GHG)
* Finding AI-Generated Faces in the Wild
* Finding Lottery Tickets in Vision Models via Data-Driven Spectral Foresight Pruning
* Fine-Grained Bipartite Concept Factorization for Clustering
* Fine-Grained High-Resolution Remote Sensing Image Change Detection by SAM-UNet Change Detection Model
* Fine-Grained Metro-Trip Detection from Cellular Trajectory Data Using Local and Global Spatial-Temporal Characteristics
* Fine-grained Prototypical Voting with Heterogeneous Mixup for Semi-supervised 2D-3D Cross-modal Retrieval
* Fine-Grained Temporal-Enhanced Transformer for Dynamic Facial Expression Recognition
* Fine-Granularity Alignment for Text-Based Person Retrieval Via Semantics-Centric Visual Division
* Fine-MVO: Toward Fine-Grained Feature Enhancement for Self-Supervised Monocular Visual Odometry in Dynamic Environments
* Fine-Scale Eddies Detected by SWOT in the Kuroshio Extension
* FineParser: A Fine-Grained Spatio-Temporal Action Parser for Human-Centric Action Quality Assessment
* FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models
* FINER: Flexible Spectral-Bias Tuning in Implicit NEural Representation by Variableperiodic Activation Functions
* FineRehab: A Multi-modality and Multi-task Dataset for Rehabilitation Analysis
* FineSports: A Multi-Person Hierarchical Sports Video Dataset for Fine-Grained Action Understanding
* Finger in Camera Speaks Everything: Unconstrained Air-Writing for Real-World
* Finger Vein Recognition Algorithm Based on the Histogram of Variable Curvature Directional Binary Statistics, A
* Fingerprint membership and identity inference against generative adversarial networks
* Finsler-Laplace-Beltrami Operators with Application to Shape Analysis
* FIQA-FAS: Face Image Quality Assessment Based Face Anti-Spoofing
* First Extension of the Robust Satellite Technique RST-FLOOD to Sentinel-2 Data for the Mapping of Flooded Areas: The Case of the Emilia Romagna (Italy) 2023 Event, A
* First Validation of Aerosol Optical Parameters Retrieved from the Terrestrial Ecosystem Carbon Inventory Satellite (TECIS) and Its Application, The
* FISBe: A Real-World Benchmark Dataset for Instance Segmentation of Long-Range thin Filamentous Structures
* FisheyeBEVSeg: Surround View Fisheye Cameras based Bird's-Eye View Segmentation for Autonomous Driving
* Fitness of Multi-Resolution Remotely Sensed Data for Cadastral Mapping in Ekiti State, Nigeria
* Fitting Flats to Flats
* Fixed Frequency Highly Efficient Resonant Converter With Low Component Count for CC and CV Charges of Electric Vehicles Batteries
* Fixed Point Diffusion Models
* Flare7K++: Mixing Synthetic and Real Datasets for Nighttime Flare Removal and Beyond
* FlashAvatar: High-Fidelity Head Avatar with Efficient Gaussian Embedding
* FlashEval: Towards Fast and Accurate Evaluation of Text-to-Image Diffusion Generative Models
* Flatten Long-Range Loss Landscapes for Cross-Domain Few-Shot Learning
* Flattening the Parent Bias: Hierarchical Semantic Segmentation in the Poincaré Ball
* Flexible and Parameter-Free Graph Learning for Multi-View Spectral Clustering
* Flexible Biometrics Recognition: Bridging the Multimodality Gap Through Attention, Alignment and Prompt Tuning
* Flexible Depth Completion for Sparse and Varying Point Densities
* Flexible Window-based Self-attention Transformer in Thermal Image Super-Resolution
* FLHetBench: Benchmarking Device and State Heterogeneity in Federated Learning
* FloCoDe: Unbiased Dynamic Scene Graph Generation with Temporal Consistency and Correlation Debiasing
* Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
* Flow-Guided Online Stereo Rectification for Wide Baseline Stereo
* FlowDiffuser: Advancing Optical Flow Estimation with Diffusion Models
* FlowerFormer: Empowering Neural Architecture Encoding Using a Flow-Aware Graph Transformer
* FlowIBR: Leveraging Pre-Training for Efficient Neural Image-Based Rendering of Dynamic Scenes
* FlowIE: Efficient Image Enhancement via Rectified Flow
* FlowTrack: Revisiting Optical Flow for Long-Range Dense Tracking
* FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis
* FlowVQTalker: High-Quality Emotional Talking Face Generation through Normalizing Flow and Quantization
* FMA-Net: Flow-Guided Dynamic Filtering and Iterative Feature Refinement with Multi-Attention for Joint Video Super-Resolution and Deblurring
* FocSAM: Delving Deeply into Focused Objects in Segmenting Anything
* Focus on Hiders: Exploring Hidden Threats for Enhancing Adversarial Training
* Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention Modulation
* Focusing on What Matters: Fine-grained Medical Activity Recognition for Trauma Resuscitation via Actor Tracking
* FocusMAE: Gallbladder Cancer Detection from Ultrasound Videos with Focused Masked Autoencoders
* Food Portion Estimation via 3D Object Scaling
* Fooling Polarization-Based Vision Using Locally Controllable Polarizing Projection
* Forecasting of 3D Whole-Body Human Poses with Grasping Objects
* Forest Change Monitoring Based on Block Instance Sampling and Homomorphic Hypothesis Margin Evaluation
* Forgery-aware Adaptive Transformer for Generalizable Synthetic Image Detection
* Forget-Me-Not: Learning to Forget in Text-to-Image Diffusion Models
* Forward-Forward Algorithm for Hyperspectral Image Classification
* FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects
* Four-Dimensional Parameter Estimation for Mixed Far-Field and Near-Field Target Localization Using Bistatic MIMO Arrays and Higher-Order Singular Value Decomposition
* Fourier Prior-Based Two-Stage Architecture for Image Restoration
* Fourier Priors-Guided Diffusion for Zero-Shot Joint Low-Light Enhancement and Deblurring
* Fourier-Basis Functions to Bridge Augmentation Gap: Rethinking Frequency Augmentation in Image Classification
* FPGA-Based Approach for Compressing and Accelerating Depthwise Separable Convolution, An
* FPN-IAIA-BL: A Multi-Scale Interpretable Deep Learning Model for Classification of Mass Margins in Digital Mammography
* Fractals as Pre-training Datasets for Anomaly Detection and Localization
* Free3D: Consistent Novel View Synthesis Without 3D Representation
* Free: Faster and Better Data-Free Meta-Learning
* FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition
* FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition
* FreeDrag: Feature Dragging for Reliable Point-Based Image Editing
* FreeKD: Knowledge Distillation via Semantic Frequency Prompt
* FreeMan: Towards Benchmarking 3D Human Pose Estimation Under Real-World Conditions
* FreePoint: Unsupervised Point Cloud Instance Segmentation
* FreeU: Free Lunch in Diffusion U-Net
* Freeway Traffic Flow Prediction Model Based on a Generalized Dynamic Spatio-Temporal Graph Convolutional Network, A
* FreGS: 3D Gaussian Splatting with Progressive Frequency Regularization
* Frequency Decoupling for Motion Magnification Via Multi-Level Isomorphic Architecture
* Frequency Domain Auxiliary Network for Image Retrieval, A
* Frequency-Adaptive Dilated Convolution for Semantic Segmentation
* Frequency-Aware Event-Based Video Deblurring for Real-World Motion Blur
* Frequency-Based Matcher for Long-Tailed Semantic Segmentation
* Fresco: Spatial-Temporal Correspondence for Zero-Shot Video Translation
* Friendly Sharpness-Aware Minimization
* From 2D Portraits to 3D Realities: Advancing GAN Inversion for Enhanced Image Synthesis
* From a Bird's Eye View to See: Joint Camera and Subject Registration without the Camera Calibration
* From Activation to Initialization: Scaling Insights for Optimizing Neural Fields
* From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations
* From Coarse to Fine-Grained Open-Set Recognition
* From Correspondences to Pose: Non-Minimal Certifiably Optimal Relative Pose Without Disambiguation
* From Feature to Gaze: A Generalizable Replacement of Linear Layer for Gaze Estimation
* From Gutenberg to Llamas: Print Optimization Through First Principles and Ai
* From Isolated Islands to Pangea: Unifying Semantic Space for Human Action Understanding
* From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models
* From Polar Day to Polar Night: A Comprehensive Sun and Star Photometer Study of Trends in Arctic Aerosol Properties in Ny-Ålesund, Svalbard
* From SAM to CAMs: Exploring Segment Anything Model for Weakly Supervised Semantic Segmentation
* From Synthetic to Real: A Calibration-free Pipeline for Few-shot Raw Image Denoising
* From Variance to Veracity: Unbundling and Mitigating Gradient Variance in Differentiable Bundle Adjustment Layers
* From-Ground-To-Objects: Coarse-to-Fine Self-supervised Monocular Depth Estimation of Dynamic Objects with Ground Contact Prior
* Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic Segmentation
* Frozen Feature Augmentation for Few-Shot Image Classification
* FSC: Few-Point Shape Completion
* FSRT: Facial Scene Representation Transformer for Face Reenactment from Factorized Appearance, Head-Pose, and Facial Expression Features
* Fully Convolutional Slice-to-Volume Reconstruction for Single-Stack MRI
* Fully Exploiting Every Real Sample: SuperPixel Sample Gradient Model Stealing
* Fully Geometric Panoramic Localization
* Fully Sparse Fusion for 3D Object Detection
* Fully Test-time Adaptation for Object Detection
* Fun with Flags: Robust Principal Directions via Flag Manifolds
* Functional Diffusion
* Functional Safety and Performance Analysis of Autonomous Route Management for Autonomous Train Control System
* Fusing Personal and Environmental Cues for Identification and Segmentation of First-Person Camera Wearers in Third-Person Views
* Fusion Transformer with Object Mask Guidance for Image Forgery Analysis
* FutureHuman3D: Forecasting Complex Long-Term 3D Human Behavior from Video Observations
* Fuzzy text/non-text classification of document images based on morphological operator, wavelet transform, and strong feature vector
* G-FARS: Gradient-Field-Based Auto-Regressive Sampling for 3D Part Grouping
* G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis
* G-NeRF: Geometry-enhanced Novel View Synthesis from Single-View Images
* G3-LQ: Marrying Hyperbolic Alignment with Explicit Semantic-Geometric Modeling for 3D Visual Grounding
* G3DR: Generative 3D Reconstruction in ImageNet
* GAFusion: Adaptive Fusing LiDAR and Camera with Multiple Guidance for 3D Object Detection
* Gain-first or Exposure-first: Benchmark for Better Low-light Video Photography and Enhancement
* GaitDAN: Cross-View Gait Recognition via Adversarial Domain Adaptation
* GALA: Generating Animatable Layered Assets from a Single Scan
* Game-Based Approximate Optimal Motion Planning for Safe Human-Swarm Interaction
* GARField: Group Anything with Radiance Fields
* Garment Recovery with Shape and Deformation Priors
* GART: Gaussian Articulated Template Models
* Gasformer: A Transformer-based Architecture for Segmenting Methane Emissions from Livestock in Optical Gas Imaging
* Gated Fields: Learning Scene Reconstruction from Gated Videos
* Gated Siamese Fusion Network based on multimodal deep and hand-crafted features for personality traits assessment
* GauHuman: Articulated Gaussian Splatting from Monocular Human Videos
* Gaussian Head Avatar: Ultra High-Fidelity Head Avatar via Dynamic Gaussians
* Gaussian Shading: Provable Performance-Lossless Image Watermarking for Diffusion Models
* Gaussian Shadow Casting for Neural Characters
* Gaussian Splatting Decoder for 3D-aware Generative Adversarial Networks
* Gaussian Splatting SLAM
* Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle
* GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians
* GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians
* GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models
* GaussianEditor: Editing 3D Gaussians Delicately with Text Instructions
* GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting
* GaussianShader: 3D Gaussian Splatting with Shading Functions for Reflective Surfaces
* GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning
* Gaze Scanpath Transformer: Predicting Visual Search Target by Spatiotemporal Semantic Modeling of Gaze Scanpath
* GBC: Guided Alignment and Adaptive Boosting CLIP Bridging Vision and Language for Robust Action Recognition
* GDA: Generalized Diffusion for Robust Test-Time Adaptation
* Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-Aware Spatio-Temporal Sampling
* GEARS: Local Geometry-Aware Hand-Object Interaction Synthesis
* Gene-Level Representation Learning via Interventional Style Transfer in Optical Pooled Screening
* GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image
* General and Efficient Training for Transformer via Token Expansion, A
* General Deformable RoI Pooling and Semi-Decoupled Head for Object Detection
* General Distortion Metric Based Histogram Shifting for Reversible Data Hiding
* General Framework for Jersey Number Recognition in Sports Video, A
* General Object Foundation Model for Images and Videos at Scale
* General On-Orbit Absolute Radiometric Calibration Method Compatible with Multiple Imaging Conditions, A
* General Point Model Pretraining with Autoencoding and Autoregressive
* Generalizable Face Landmarking Guided by Conditional Face Warping
* Generalizable Novel-View Synthesis Using a Stereo Camera
* Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction
* Generalized Event Cameras
* Generalized Few-Shot Meets Remote Sensing: Discovering Novel Classes in Land Cover Mapping via Hybrid Semantic Segmentation Framework
* Generalized Foggy-Scene Semantic Segmentation by Frequency Decoupling
* Generalized Large-Scale Data Condensation via Various Backbone and Statistical Matching
* Generalized Predictive Model for Autonomous Driving
* Generalized Single-Image-Based Morphing Attack Detection Using Deep Representations from Vision Transformer
* Generalizing 6-DoF Grasp Detection via Domain Prior Knowledge
* Generate Like Experts: Multi-Stage Font Generation by Incorporating Font Transfer Process into Diffusion Models
* Generate Subgoal Images Before Act: Unlocking the Chain-of-Thought Reasoning in Diffusion Model for Robot Manipulation with Multimodal Prompts
* Generating Accurate and Diverse Audio Captions Through Variational Autoencoder Framework
* Generating Content for HDR Deghosting from Frequency View
* Generating Diverse Agricultural Data for Vision-Based Farming Applications
* Generating Enhanced Negatives for Training Language-Based Object Detectors
* Generating Handwritten Mathematical Expressions From Symbol Graphs: An End-to-End Pipeline
* Generating Human Motion in 3D Scenes from Text Descriptions
* Generating Illustrated Instructions
* Generating Material-Aware 3D Models from Sparse Views
* Generating neural architectures from parameter spaces for multi-agent reinforcement learning
* Generating Non-Stationary Textures Using Self-Rectification
* Generation of Structurally Realistic Retinal Fundus Images with Diffusion Models
* Generative 3D Part Assembly via Part-Whole-Hierarchy Message Passing
* Generative Adversarial Networks for Biomedical Imaging
* Generative Approach for Wikipedia-Scale Visual Entity Recognition, A
* Generative Dataset Distillation: Balancing Global Structure and Local Details
* Generative Exploration of Cuisine Transfer, A
* Generative Image Dynamics
* Generative Latent Coding for Ultra-Low Bitrate Image Compression
* Generative Multi-modal Models are Good Class-Incremental Learners
* Generative Multimodal Models are In-Context Learners
* Generative Powers of Ten
* Generative Proxemics: A Prior for 3D Social Interaction from Images
* Generative Quanta Color Imaging
* Generative Region-Language Pretraining for Open-Ended Object Detection
* Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models
* Generative Unlearning for Any Identity
* Generic-to-Specific Distillation of Masked Autoencoders
* GenesisTex: Adapting Image Denoising Diffusion to Texture Space
* Genetic Algorithm Empowering Unsupervised Learning for Optimizing Building Segmentation from Light Detection and Ranging Point Clouds
* GenFlow: Generalizable Recurrent Flow for 6D Pose Refinement of Novel Objects
* GenH2R: Learning Generalizable Human-to-Robot Handover via Scalable Simulation, Demonstration, and Imitation
* GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos
* GenN2N: Generative NeRF2NeRF Translation
* GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction
* GenTron: Diffusion Transformers for Image and Video Generation
* Genuine Knowledge from Practice: Diffusion Test-Time Adaptation for Video Adverse Weather Removal
* GenVideo: One-shot target-image and shape aware video editing using T2I diffusion models
* GenZI: Zero-Shot 3D Human-Scene Interaction Generation
* GeoAuxNet: Towards Universal 3D Representation Learning for Multi-Sensor Point Clouds
* GeoChat: Grounded Large Vision-Language Model for Remote Sensing
* GeoGen: Geometry-Aware Generative Modeling via Signed Distance Functions
* Geographical Entity Management Model Based on Multi-Classification
* GeoLLM-Engine: A Realistic Environment for Building Geospatial Copilots
* Geometric Characterization of the Mateur Plain in Northern Tunisia Using Vertical Electrical Sounding and Remote Sensing Techniques
* Geometric Model for Polarization Imaging on Projective Cameras, A
* Geometric Prior Based Deep Human Point Cloud Geometry Compression
* Geometrically-Driven Aggregation for Zero-Shot 3D Point Cloud Understanding
* Geometry Transfer for Stylizing Radiance Fields
* Geometry-aware Reconstruction and Fusion-refined Rendering for Generalizable Neural Radiance Fields
* GeoReF: Geometric Alignment Across Shape Variation for Category-level Object Pose Refinement
* GeoSynth: Contextually-Aware High-Resolution Satellite Image Synthesis
* GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering
* GESCAM: A Dataset and Method on Gaze Estimation for Classroom Attention Measurement
* GestFormer: Multiscale Wavelet Pooling Transformer Network for Dynamic Hand Gesture Recognition
* GHNeRF: Learning Generalizable Human Features with Efficient Neural Radiance Fields
* GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence
* GigaTraj: Predicting Long-term Trajectories of Hundreds of Pedestrians in Gigapixel Complex Scenes
* GIS-Based Analytical Hierarchy Process for Identifying Groundwater Potential Zones in Punjab, Pakistan
* GLACE: Global Local Accelerated Coordinate Encoding
* GLaMM: Pixel Grounding Large Multimodal Model
* GLID: Pre-training a Generalist Encoder-Decoder Vision Model
* GLiDR: Topologically Regularized Graph Generative Network for Sparse LiDAR Point Clouds
* GlitchBench: Can Large Multimodal Models Detect Video Game Glitches?
* Global and Hierarchical Geometry Consistency Priors for Few-Shot NeRFs in Indoor Scenes
* Global and Local Prompts Cooperation via Optimal Transport for Federated Learning
* Global Latent Neural Rendering
* GLOW: Global Layout Aware Attacks on Object Detection
* GM-DETR: Generalized Muiltispectral DEtection TRansformer with Efficient Fusion Encoder for Visible-Infrared Detection
* GNSS-IR Soil Moisture Retrieval Using Multi-Satellite Data Fusion Based on Random Forest
* GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation
* GOENet: Group Operations Enhanced Binary Neural Network for Efficient Image Classification
* Going Beyond Multi-Task Dense Prediction with Synergy Embedding Models
* GoMAvatar: Efficient Animatable Human Modeling from Monocular Video Using Gaussians-on-Mesh
* GoMVS: Geometrically Consistent Cost Aggregation for Multi-View Stereo
* Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data
* GoodSAM: Bridging Domain and Capacity Gaps via Segment Anything Model for Distortion-Aware Panoramic Semantic Segmentation
* GOV-NeSF: Generalizable Open-Vocabulary Neural Semantic Fields
* GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding
* GPLD3D: Latent Diffusion of 3D Shape Generative Models by Enforcing Geometric and Physical Priors
* GPP of a Chinese Savanna Ecosystem during Different Phenological Phases Simulated from Harmonized Landsat and Sentinel-2 Data
* GPS-Gaussian: Generalizable Pixel-Wise 3D Gaussian Splatting for Real-Time Human Novel View Synthesis
* GPT as Psychologist? Preliminary Evaluations for GPT-4V on Visual Affective Computing
* GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation
* GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
* GPT4Point: A Unified Framework for Point-Language Understanding and Generation
* GraCo: Granularity-Controllable Interactive Segmentation
* Grad-CAMO: Learning Interpretable Single-Cell Morphological Profiles from 3D Cell Painting Images
* Gradient Alignment for Cross-Domain Face Anti-Spoofing
* Gradient Reweighting: Towards Imbalanced Class-Incremental Learning
* Gradient-based Parameter Selection for Efficient Fine-Tuning
* GraFIQs: Face Image Quality Assessment Using Gradient Magnitudes
* GRAM: Global Reasoning for Multi-Page VQA
* Graph Convolutional Networks With Adaptive Neighborhood Awareness
* Graph neural collaborative filtering with medical content-aware pre-training for treatment pattern recommendation
* GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs
* GRASP-GCN: Graph-Shape Prioritization for Neural Architecture Search under Distribution Shifts
* Greedy Capon Beamformer
* GreedyViG: Dynamic Axial Graph Construction for Efficient Vision GNNs
* GRIB: Combining Global Reception and Inductive Bias For Human Segmentation and Matting
* Grid Diffusion Models for Text-to-Video Generation
* GridFormer: Residual Dense Transformer with Grid Structure for Image Restoration in Adverse Weather Conditions
* Gridless DOA Estimation Method for Arbitrary Array Geometries Based on Complex-Valued Deep Neural Networks
* Ground-VIO: Monocular Visual-Inertial Odometry With Online Calibration of Camera-Ground Geometric Parameters
* Grounded Question-Answering in Long Egocentric Videos
* Grounded Text-to-Image Synthesis with Attention Refocusing
* Groundhog Grounding Large Language Models to Holistic Segmentation
* Grounding and Enhancing Grid-based Models for Neural Fields
* Grounding Everything: Emerging Localization Properties in Vision-Language Transformers
* Grounding Stylistic Domain Generalization with Quantitative Domain Shift Measures and Synthetic Scene Images
* Group Multi-View Transformer for 3D Shape Analysis With Spatial Encoding
* GroupContrast: Semantic-Aware Self-Supervised Representation Learning for 3D Understanding
* Groupwise Query Specialization and Quality-Aware Multi-Assignment for Transformer-Based Visual Relationship Detection
* GS-IR: 3D Gaussian Splatting for Inverse Rendering
* GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting
* GSAM+Cutie: Text-Promptable Tool Mask Annotation for Endoscopic Video
* GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding
* GSVA: Generalized Segmentation via Multimodal Large Language Models
* Guess The Unseen: Dynamic 3D Scene Reconstruction from Partial 2D Glimpses
* Guided Slot Attention for Unsupervised Video Object Segmentation
* H-ViT: A Hierarchical Vision Transformer for Deformable Image Registration
* H3Net: Irregular Posture Detection by Understanding Human Character and Core Structures
* Habitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene Scale and Realism Tradeoffs for ObjectGoal Navigation
* Hairy Ground Truth Enhancement for Semantic Segmentation
* HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data
* Hallucination Augmented Contrastive Learning for Multimodal Large Language Model
* Hallusionbench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models
* HaLViT: Half of the Weights are Enough
* HandBooster: Boosting 3D Hand-Mesh Reconstruction by Conditional Synthesis and Sampling of Hand-Object Interactions
* HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud
* HanDiffuser: Text-to-Image Generation with Realistic Hand Appearances
* HardMo: A Large-Scale Hardcase Dataset for Motion Capture
* Hardness-Aware Scene Synthesis for Semi-Supervised 3D Object Detection
* Harmonic/Percussive Source Separation Based on Anisotropic Smoothness of Magnitude Spectrograms via Convex Optimization
* HarmonyView: Harmonizing Consistency and Diversity in One-Image-to-3D
* Harnessing Large Language Models for Training-Free Video Anomaly Detection
* Harnessing Meta-Learning for Improving Full-Frame Video Stabilization
* Harnessing the Power of MLLMs for Transferable Text-to-Image Person ReID
* HarvestNet: A Dataset for Detecting Smallholder Farming Activity Using Harvest Piles and Remote Sensing
* Hash-Based Gaussian Mixture Model (HGMM) for Roadside LiDAR Smart Infrastructure Applications
* HashPoint: Accelerated Point Searching and Sampling for Neural Rendering
* HAVE-FUN: Human Avatar Reconstruction from Few-Shot Unconstrained Images
* Hazard and Safety Analysis of Machine-Learning-Based Perception Capabilities in Autonomous Vehicles
* Hazard Susceptibility Mapping with Machine and Deep Learning: A Literature Review
* HC-MVSNet: A probability sampling-based multi-view-stereo network with hybrid cascade structure for 3D reconstruction
* HCLR-Net: Hybrid Contrastive Learning Regularization with Locally Randomized Perturbation for Underwater Image Enhancement
* HDQMF: Holographic Feature Decomposition using Quantum Algorithms
* HDRFlow: Real-Time HDR Video Reconstruction with Large Motions
* Heading Measurement Frame Based on Atmospheric Scattering Beams for Intelligent Vehicle
* HEAL-SWIN: A Vision Transformer on the Sphere
* Hearing Anything Anywhere
* Heat Kernel Diffusion for Enhanced Late Fusion Multi-View Clustering
* Heterogeneous Graph Network for Action Detection
* Heterogeneous Graph Transformer for Multiple Tiny Object Tracking in RGB-T Videos
* Heterogeneous Window Transformer for Image Denoising
* HHMR: Holistic Hand Mesh Recovery by Enhancing the Multimodal Controllability of Graph Diffusion Models
* Hide in Thicket: Generating Imperceptible and Rational Adversarial Perturbations on 3D Point Clouds
* Hierarchical Augmentation and Distillation for Class Incremental Audio-Visual Video Recognition
* Hierarchical Competition Learning for Pairwise Wheel Grounding Points Estimation
* Hierarchical Correlation Clustering and Tree Preserving Embedding
* Hierarchical Diffusion Policy for Kinematics-Aware Multi-Task Robotic Manipulation
* Hierarchical Histogram Threshold Segmentation: Auto-terminating High-detail Oversegmentation
* Hierarchical Intra-Modal Correlation Learning for Label-Free 3D Semantic Segmentation
* Hierarchical NeuroSymbolic Approach for Comprehensive and Explainable Action Quality Assessment
* Hierarchical Patch Diffusion Models for High-Resolution Video Generation
* Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation
* HiFi-GANw: Watermarked Speech Synthesis via Fine-Tuning of HiFi-GAN
* HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting
* HIG: Hierarchical Interlacement Graph Approach to Scene Graph Generation in Video Understanding
* High Precision Time Synchronization Strategy for Low-Cost Embedded GNSS/MEMS-IMU Integrated Navigation Module
* High Quality Reference Feature for Two Stage Bracketing Image Restoration and Enhancement
* High-Efficiency Forward Modeling of Gravitational Fields in Spherical Harmonic Domain with Application to Lunar Topography Correction
* High-fidelity Person-centric Subject-to-Image Synthesis
* High-Level Feature Guided Decoding for Semantic Segmentation
* High-Performance Image Steganography Scheme Based on Dual-Adversarial Networks, A
* High-Precision Heterogeneous Satellite Image Manipulation Localization: Feature Point Rules and Semantic Similarity Measurement
* High-Quality Facial Geometry and Appearance Capture at Home
* High-Resolution and Robust Microwave Correlation Imaging Method Based on URRF Using MC-AAMPE Algorithm, A
* High-Resolution Detection of Earth Structural Heterogeneities from Seismic Amplitudes using Convolutional Neural Networks with Attention layers
* High-Resolution Sea Surface Target Detection Using Bi-Frequency High-Frequency Surface Wave Radar
* High-Visibility Edge-Highlighting Visualization of 3D Scanned Point Clouds Based on Dual 3D Edge Extraction
* Higher-order Relational Reasoning for Pedestrian Trajectory Prediction
* Highly Efficient Compressive Sensing Algorithm Based on Root-Sparse Bayesian Learning for RFPA Radar, A
* HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation
* HiLo: Detailed and Robust 3D Clothed Human Reconstruction with High-and Low-Frequency Information of Parametric Models
* HIMap: HybrId Representation Learning for End-to-end Vectorized HD Map Construction
* Himawari-8 Sea Surface Temperature Products from the Australian Bureau of Meteorology
* Hinge-Wasserstein: Estimating Multimodal Aleatoric Uncertainty in Regression Tasks
* HINTED: Hard Instance Enhanced Detector with Mixed-Density Feature Fusion for Sparsely-Supervised 3D Object Detection
* HiPose: Hierarchical Binary Surface Encoding and Correspondence Pruning for RGB-D 6DoF Object Pose Estimation
* HIPTrack: Visual Tracking with Historical Prompts
* HIR-Diff: Unsupervised Hyperspectral Image Restoration Via Improved Diffusion Models
* HirFormer: Dynamic High Resolution Transformer for Large-Scale Image Shadow Removal
* Histopathological Image Classification with Cell Morphology Aware Deep Neural Networks
* HIT: Estimating Internal Human Implicit Tissues from the Body Surface
* HitFusion: Infrared and Visible Image Fusion for High-Level Vision Tasks Using Transformer
* HIVE: Harnessing Human Feedback for Instructional Visual Editing
* HMANet: Hybrid Multi-Axis Aggregation Network for Image Super-Resolution
* HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable Sparse Observations
* hmOS: An Extensible Platform for Task-Oriented Human-Machine Computing
* HNN: Hierarchical Noise-Deinterlace Net Towards Image Denoising
* HOI-M3: Capture Multiple Humans and Objects Interaction within Contextual Environment
* HOIAnimator: Generating Text-Prompt Human-Object Animations Using Novel Perceptive Diffusion Models
* HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data
* HOISDF: Constraining 3D Hand-Object Pose Estimation with Global Signed Distance Fields
* HOIST-Former: Hand-Held Objects Identification, Segmentation, and Tracking in the Wild
* HOLD: Category-Agnostic 3D Reconstruction of Interacting Hands and Objects from Video
* Holistic Autonomous Driving Understanding by Bird'View Injected Multi-Modal Large Models
* Holistic Features are Almost Sufficient for Text-to-Video Retrieval
* Holo-Relighting: Controllable Volumetric Portrait Relighting from a Single Image
* Holodeck: Language Guided Generation of 3D Embodied AI Environments
* Holoported Characters: Real-Time Free-Viewpoint Rendering of Humans from Sparse RGB Cameras
* HoloVic:Large-scale Dataset and Benchmark for Multi-Sensor Holographic Intersection and Vehicle-Infrastructure Cooperative
* HomoFormer: Homogenized Transformer for Image Shadow Removal
* Honeybee: Locality-Enhanced Projector for Multimodal LLM
* Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation
* HouseCat6D: A Large-Scale Multi-Modal Category Level 6D Object Perception Dataset with Household Objects in Realistic Scenarios
* How Far Ahead Should Autonomous Vehicles Start Resolving Predicted Conflicts? Exploring Uncertainty-Based Safety-Efficiency Trade-Off
* How Far can we Compress Instant-NGP-Based NeRF?
* How is Visual Attention Influenced by Text Guidance? Database and Model
* How Much You Ate? Food Portion Estimation on Spoons
* How SAM Perceives Different mp-MRI Brain Tumor Domains?
* How Suboptimal is Training rPPG Models with Videos and Targets from Different Body Sites?
* How to Benchmark Vision Foundation Models for Semantic Segmentation?
* How to Configure Good In-Context Sequence for Visual Question Answering
* How to Design a Cheap Music Detection System Using a Simple Multilayer Perceptron With Temporal Integration
* How to Handle Sketch-Abstraction in Sketch-Based Image Retrieval?
* How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval?
* How to Train Neural Field Representations: A Comprehensive Study and Benchmark
* HPL-ESS: Hybrid Pseudo-Labeling for Unsupervised Event-based Semantic Segmentation
* HPNet: Dynamic Trajectory Forecasting with Historical Prediction Attention
* HRVDA: High-Resolution Visual Document Assistant
* HRWS SAR Motion Compensation Method with Multichannel Phase Correction, A
* HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting
* HUGS: Human Gaussian Splats
* Human Gaussian Splatting: Real-Time Rendering of Animatable Avatars
* Human Motion Prediction Under Unexpected Perturbation
* Human-Guided Deep Reinforcement Learning for Optimal Decision Making of Autonomous Vehicles
* Human-in-the-Loop Segmentation of Multi-species Coral Imagery
* HumanFormer: Human-centric Prompting Multi-modal Perception Transformer for Referring Crowd Detection
* HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting
* HumanNeRF-SE: A Simple yet Effective Approach to Animate HumanNeRF with Diverse Poses
* HumanNorm: Learning Normal Diffusion Model for High-quality and Realistic 3D Human Generation
* HumanRef: Single Image to 3D Human Generation via Reference-Guided Diffusion
* HumMUSS: Human Motion Understanding Using State Space Models
* HUNTER: Unsupervised Human-Centric 3D Detection via Transferring Knowledge from Synthetic Instances to Real Scenes
* Hunting Attributes: Context Prototype-Aware Learning for Weakly Supervised Semantic Segmentation
* Hybrid ANN-SNN Architecture for Low-Power and Low-Latency Visual Perception, A
* Hybrid CNN-Transformer Architecture for Efficient Large-Scale Video Snapshot Compressive Imaging
* Hybrid Cross-View Attention Network for Lightweight Stereo Image Super-Resolution
* Hybrid Functional Maps for Crease-Aware Non-Isometric Shape Matching
* Hybrid Proposal Refiner: Revisiting DETR Series from the Faster R-CNN Perspective
* Hybrid Reinforcement Learning-Based Method for Generating Privacy-Preserving Trajectories in Low-Density Traffic Environments, A
* Hybrid Segmentation Approach for Tumors Detection in Brain Using Machine Learning Algorithms
* Hybrid Shape Deformation for Face Reconstruction in Aesthetic Orthodontics
* HybridNeRF: Efficient Neural Rendering via Adaptive Volumetric Surfaces
* Hydrological Cycle in the Arabian Sea Region from GRACE/GRACE-FO Missions and ERA5 Data
* Hyper-Anchor Based Lane Detection
* Hyper-MD: Mesh Denoising with Customized Parameters Aware of Noise Intensity and Geometric Characteristics
* Hyperbolic Anomaly Detection
* Hyperbolic Learning with Synthetic Captions for Open-World Detection
* HyperBT: Redundancy Reduction-Based Self-Supervised Learning for Hyperspectral Image Classification
* HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models
* Hypergraph Representation Learning for Remote Sensing Image Change Detection
* HyperKon: A Self-Supervised Contrastive Network for Hyperspectral Image Analysis
* HyperLeaf2024: A Hyperspectral Imaging Dataset for Classification and Regression of Wheat Leaves
* HyperSDFusion: Bridging Hierarchical Structures in Language and Geometry for Enhanced 3D Text2Shape Generation
* Hyperspectral Image Transects during Transient Events in Rivers (HITTER): Framework Development and Application to a Tracer Experiment on the Missouri River, USA
* Hyperspectral Imaging for Phenotyping Plant Drought Stress and Nitrogen Interactions Using Multivariate Modeling and Machine Learning Techniques in Wheat
* Hyperspherical Classification with Dynamic Label-to-Prototype Assignment
* I'M HOI: Inertia-Aware Monocular Capture of 3D Human-Object Interactions
* i-MAE: Are Latent Representations in Masked Autoencoders Linearly Separable?
* IAD-Net: Single-Image Dehazing Network Based on Image Attentionxo
* IBD-SLAM: Learning Image-Based Depth Fusion for Generalizable SLAM
* ICON: Incremental CONfidence for Joint Pose and Radiance Field Optimization
* ICP-Flow: LiDAR Scene Flow Estimation with ICP
* ICSVR: Investigating Compositional and Syntactic Understanding in Video Retrieval Models
* ICTH: Local-to-Global Spectral Reconstruction Network for Heterosource Hyperspectral Images
* ID-Blau: Image Deblurring by Implicit Diffusion-Based reBLurring AUgmentation
* ID-like Prompt Learning for Few-Shot Out-of-Distribution Detection
* IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models
* IDENet: Implicit Degradation Estimation Network for Efficient Blind Super Resolution
* Identification and Analysis of Ecological Corridors in the Central Urban Area of Xuchang Based on Multi-Source Geospatial Data
* Identification and Characterization of Reclaimed and Underclaimed Mine Features Using Lidar and Temporal Remote Sensing Methods within the Coastal Plain Uranium Mining Region of Texas
* Identification of Internal Tides in ECCO Estimates of Sea Surface Salinity in the Andaman Sea
* Identification of Potential Landslides in the Gaizi Valley Section of the Karakorum Highway Coupled with TS-InSAR and Landslide Susceptibility Analysis
* Identification of Spatial Distribution of Afforestation, Reforestation, and Deforestation and Their Impacts on Local Land Surface Temperature in Yangtze River Delta and Pearl River Delta Urban Agglomerations of China
* Identifying Conservation Priority Areas of Hydrological Ecosystem Service Using Hot and Cold Spot Analysis at Watershed Scale
* Identifying Determinants of Spatiotemporal Disparities in Ecological Quality of Mongolian Plateau
* Identifying Important Group of Pixels using Interactions
* IDGuard: Robust, General, Identity-Centric POI Proactive Defense Against Face Editing Abuse
* iEdit: Localised Text-guided Image Editing with Weak Supervision
* IG2: Integrated Gradient on Iterative Gradient Path for Feature Attribution
* IGM-MELv2: Infrared Guiding Modal Multiuser Eye Localization System on ARM CPU for Autostereoscopic Displays
* IIRP-Net: Iterative Inference Residual Pyramid Network for Enhanced Image Registration
* iKUN: Speak to Trackers Without Retraining
* Illuminant Equivariant Networks for Computational Color Constancy
* Image Hiding Based on Compressive Autoencoders and Normalizing Flow
* Image Manipulation Detection With Cascade Hierarchical Graph Representation
* Image Neural Field Diffusion Models
* Image Processing GNN: Breaking Rigidity in Super-Resolution
* Image Restoration by Denoising Diffusion Models with Iteratively Preconditioned Guidance
* Image restoration refinement with Uformer GAN
* Image Sculpting: Precise Object Editing with 3D Geometry Control
* Image-caption difficulty for efficient weakly-supervised object detection from in-the-wild data
* Image-Level Adaptive Adversarial Ranking for Person Re-Identification
* Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation
* Image-to-Image Matching via Foundation Models: A New Perspective for Open-Vocabulary Semantic Segmentation
* ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object
* Imagine Before Go: Self-Supervised Generative Map for Object Goal Navigation
* Imaging Signal Recovery Using Neural Network Priors Under Uncertain Forward Model Parameters
* Imbalance-Aware Discriminative Clustering for Unsupervised Semantic Segmentation
* IMIL: Interactive Medical Image Learning Framework
* Impact of Airbnb on Long-Term Rental Markets in San Francisco: A Geospatial Analysis Using Multiscale Geographically Weighted Regression, The
* Impact of Assimilating Geostationary Interferometric Infrared Sounder Observations from Long- and Middle-Wave Bands on Weather Forecasts with a Locally Cloud-Resolving Global Model
* Impact of Long-Term Drought on Surface Water and Water Balance Variations in Iran: Insights from Highland and Lowland Regions
* Impact of Smartphone Activity on Pedestrian Safety: A Case Study in Seoul
* Impact of Video Compression Artifacts on Fisheye Camera Visual Perception Tasks
* Implicit Assimilation of Sparse In Situ Data for Dense & Global Storm Surge Forecasting
* Implicit Discriminative Knowledge Learning for Visible-Infrared Person Re-Identification
* Implicit Event-RGBD Neural SLAM
* Implicit Motion Function
* ImplicitTerrain: a Continuous Surface Model for Terrain Data Analysis
* IMPRINT: Generative Object Compositing by Learning Identity-Preserving Representation
* Improved Baselines with Visual Instruction Tuning
* Improved Crop and Weed Detection with Diverse Data Ensemble Learning
* Improved Implicit Neural Representation with Fourier Reparameterized Training
* Improved Methods for Retrieval of Chlorophyll Fluorescence from Satellite Observation in the Far-Red Band Using Singular Value Decomposition Algorithm
* Improved NSGAII for Integrated Container Scheduling Problems With Two Transshipment Routes, An
* Improved Population Mapping for China Using the 3D Building, Nighttime Light, Points-of-Interest, and Land Use/Cover Data within a Multiscale Geographically Weighted Regression Model
* Improved Self-Training for Test-Time Adaptation
* Improved Small Object Detection Algorithm Based on YOLOv5
* Improved Visual Grounding through Self-Consistent Explanations
* Improved Zero-Shot Classification by Adapting VLMs with Text Descriptions
* Improving Bird's Eye View Semantic Segmentation by Task Decomposition
* Improving Consistency in Cardiovascular Disease Risk Assessment: Cross-Camera Adaptation for Retinal Images
* Improving Depth Completion via Depth Feature Upsampling
* Improving Distant 3D Object Detection Using 2D Box Supervision
* Improving End-to-End Sign Language Translation With Adaptive Video Representation Enhanced Transformer
* Improving Generalization via Meta-Learning on Hard Samples
* Improving Generalized Zero-Shot Learning by Exploring the Diverse Semantics from External Class Names
* Improving Graph Contrastive Learning via Adaptive Positive Sampling
* Improving Image Restoration Through Removing Degradations in Textual Representations
* Improving Image-Text Matching by Integrating Word Sense Disambiguation
* Improving Object Detection to Fisheye Cameras with Open-Vocabulary Pseudo-Label Approach
* Improving Out-of-Distribution Generalization in Graphs via Hierarchical Semantic Environments
* Improving Physics-Augmented Continuum Neural Radiance Field-Based Geometry-Agnostic System Identification with Lagrangian Particle Optimization
* Improving Plasticity in Online Continual Learning via Collaborative Learning
* Improving Semantic Correspondence with Viewpoint-Guided Spherical Maps
* Improving Single Domain-Generalized Object Detection: A Focus on Diversification and Alignment
* Improving Spectral Snapshot Reconstruction with Spectral-Spatial Rectification
* Improving Subject-Driven Image Synthesis with Subject-Agnostic Guidance
* Improving the Efficiency-Accuracy Trade-off of DETR-Style Models in Practice
* Improving the ERA5-Land Temperature Product through a Deep Spatiotemporal Model That Uses Fused Multi-Source Remote Sensing Data
* Improving the Generalization of Segmentation Foundation Model under Distribution Shift via Weakly Supervised Adaptation
* Improving the Robustness of 3D Human Pose Estimation: A Benchmark Dataset and Learning from Noisy Input
* Improving Training Efficiency of Diffusion Models via Multi-Stage Framework and Tailored Multi-Decoder Architecture
* Improving Transferable Targeted Adversarial Attacks with Model Self-Enhancement
* Improving Unsupervised Hierarchical Representation With Reinforcement Learning
* Improving Urban Travel Time Estimation Using Gaussian Mixture Models
* Improving Valence-Arousal Estimation with Spatiotemporal Relationship Learning and Multimodal Fusion
* Improving Visual Recognition with Hyperbolical Visual Hierarchy Mapping
* Improving Visual Representations of Masked Autoencoders With Artifacts Suppression
* In Memoriam: Xiaoou Tang
* In Search of a Data Transformation that Accelerates Neural Field Training
* In-Context Matting
* In-Distribution Public Data Synthesis With Diffusion Models for Differentially Private Image Classification
* In-N-Out: Faithful 3D GAN Inversion with Volumetric Decomposition for Face Editing
* in2IN: Leveraging individual Information to Generate Human INteractions
* In2SET: Intra-Inter Similarity Exploiting Transformer for Dual-Camera Compressive Hyperspectral Imaging
* InceptionNeXt: When Inception Meets ConvNeXt
* Incorporating Effects of Slope Units and Sliding Areas into Seismically Induced Landslide Risk Modeling in Tectonically Active Mountainous Areas
* Incorporating Geo-Diverse Knowledge into Prompting for Increased Geographical Robustness in Object Recognition
* Incremental Nuclei Segmentation from Histopathological Images via Future-class Awareness and Compatibility-inspired Distillation
* Incremental Residual Concept Bottleneck Models
* Indian Traffic Sign Detection and Classification Through a Unified Framework
* Infer from What You Have Seen Before: Temporally-dependent Classifier for Semi-supervised Video Segmentation
* Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation
* InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning
* Influence of Vaccine Willingness on Epidemic Spreading in Social Networks, The
* Infrared Adversarial Car Stickers
* Infrared Small Target Detection with Scale and Location Sensitivity
* Infrared Weak Target Detection in Dual Images and Dual Areas
* Initial Phase Coding With Two-Dimensional Local Low Sidelobes for Suppression of Active Forwarding Signals
* Initialization Matters for Adversarial Transfer Learning
* Initno: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization
* Ink Dot-Oriented Differentiable Optimization for Neural Image Halftoning
* Inlier Confidence Calibration for Point Cloud Registration
* InNeRF360: Text-Guided 3D-Consistent Object Inpainting on 360° Neural Radiance Fields
* InSAR Integrated Machine Learning Approach for Landslide Susceptibility Mapping in California
* Insect-Foundation: A Foundation Model and Large-Scale 1M Dataset for Visual Insect Understanding
* Insights from the Use of Previously Unseen Neural Architecture Search Datasets
* InstaGen: Enhancing Object Detection by Training on Synthetic Dataset
* Instance Tracking in 3D Scenes from Egocentric Videos
* Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation
* Instance-Aware Contrastive Learning for Occluded Human Mesh Reconstruction
* Instance-Aware Exploration-Verification-Exploitation for Instance ImageGoal Navigation
* Instance-Aware Group Quantization for Vision Transformers
* Instance-based Cyclegan for Object Segmentation with Few Annotations
* Instance-based Max-margin for Practical Few-shot Recognition
* Instance-level Expert Knowledge and Aggregate Discriminative Attention for Radiology Report Generation
* InstanceDiffusion: Instance-Level Control for Image Generation
* Instant3D: Instant Text-to-3D Generation
* Instantaneous Perception of Moving Objects in 3D
* InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning
* Instruct 4D-to-4D: Editing 4D Scenes as Pseudo-3D Scenes Using 2D Diffusion
* Instruct-Imagen: Image Generation with Multi-modal Instruction
* Instruct-ReID: A Multi-Purpose Person Re-Identification Task with Instructions
* InstructDiffusion: A Generalist Modeling Interface for Vision Tasks
* InstructVideo: Instructing Video Diffusion Models with Human Feedback
* Integrating Efficient Optimal Transport and Functional Maps for Unsupervised Shape Correspondence Learning
* Integrating Language-Derived Appearance Elements With Visual Cues in Pedestrian Detection
* Integrating Sequential Backward Selection (SBS) and CatBoost for Snow Avalanche Susceptibility Mapping at Catchment Scale
* Integrating the Safety Control Against Cyber-Attacks on the Global Information in Coupled Map Car-Following Model Under Connected Vehicles Platoon Environment
* Integrating Thepade SBTC and Niblack thresholding features for identification of land usage from aerial images using ensemble of machine learning algorithms
* Intelligent Assessment of Pavement Structural Conditions: A Novel FeMViT Classification Network for GPR Images
* Intelligent Grimm: Open-ended Visual Storytelling via Latent Diffusion Models
* Intelligent Reflective Surface Assisted Integrated Sensing and Wireless Power Transfer
* Intensity-Robust Autofocus for Spike Camera
* Inter-X: Towards Versatile Human-Human Interaction Analysis
* InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models
* Interactive Continual Learning: Fast and Slow Thinking
* Interactive Fusion and Correlation Network for Three-Modal Images Few-Shot Semantic Segmentation
* Interactive Generative Adversarial Networks With High-Frequency Compensation for Facial Attribute Editing
* Interactive Navigation Method with Effect-oriented Affordance, An
* Interactive Spectral-Spatial Transformer for Hyperspectral Image Classification
* Interactive3D: Create What You Want by Interactive 3D Generation
* Interannual Glacial Mass Changes in High Mountain Asia and Connections to Climate Variability
* InterHandGen: Two-Hand Interaction Generation via Cascaded Reverse Diffusion
* Intern VL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
* Interpretable Measures of Conceptual Similarity by Complexity-Constrained Descriptive Auto-Encoding
* Interpreting COVID Lateral Flow Tests' Results with Foundation Models
* Intersensor Calibration of Spaceborne Passive Microwave Radiometers and Algorithm Tuning for Long-Term Sea Ice Trend Analysis Based on AMSR-E Observations
* Intraoperative 2D/3D Image Registration via Differentiable X-Ray Rendering
* Intriguing Properties of Diffusion Models: An Empirical Study of the Natural Attack Capability in Text-to-Image Generative Models
* Intrinsic Image Decomposition Based on Retinex Theory, Superpixel Segmentation and Scale-space Computations
* Intrinsic Image Diffusion for Indoor Single-view Material Estimation
* IntrinsicAvatar: Physically Based Inverse Rendering of Dynamic Humans from Monocular Videos via Explicit Ray Tracing
* Introduction to the special issue on Computer vision solutions for part-based image analysis and classification (CV_PARTIAL)
* Introduction to the special section Advances trends of pattern recognition for intelligent systems applications (SS:ISPR23)
* Inventory and Spatial Distribution of Landslides on the Eastern Slope of Gongga Mountain, Southwest China
* InVERGe: Intelligent Visual Encoder for Bridging Modalities in Report Generation
* Inverse Rendering of Glossy Objects via the Neural Plenoptic Function and Radiance Fields
* Inversion-Free Image Editing with Language-Guided Diffusion Models
* Investigating and Mitigating the Side Effects of Noisy Views for Self-Supervised Clustering Algorithms in Practical Multi-View Scenarios
* Investigating Calibration and Corruption Robustness of Post-hoc Pruned Perception CNNs: An Image Classification Benchmark Study
* Investigating Compositional Challenges in Vision-Language Models for Visual Grounding
* Investigating Resident-Tourist Sharing of Urban Public Recreation Space and Its Influencing Factors
* Investigating Spatial Effects through Machine Learning and Leveraging Explainable AI for Child Malnutrition in Pakistan
* Investigating the Effectiveness of Cross-Attention to Unlock Zero-Shot Editing of Text-to-Video Diffusion Models
* Investigation and Validation of Split-Window Algorithms for Estimating Land Surface Temperature from Landsat 9 TIRS-2 Data
* Investigation into the Impact of AI-Powered Image Enhancement on Forensic Facial Recognition, An
* Investigation of Image Features for Perceptually Equivalent Gloss Reproduction Through Comparison of Real Objects and Images
* InViG: Benchmarking Open-Ended Interactive Visual Grounding with 500K Dialogues
* IPoD: Implicit Field Learning with Point Diffusion for Generalizable 3D Object Reconstruction from Single RGB-D Images
* IQ-VFI: Implicit Quadratic Motion Estimation for Video Frame Interpolation
* IReNe: Instant Recoloring of Neural Radiance Fields
* IRFNet: Skin Lesion Detection and Classification Using Unified Intuitive and Object Classifier with Iterative Random Forest Algorithm
* IrrNet: Advancing Irrigation Mapping with Incremental Patch Size Training on Remote Sensing Imagery
* IrrNet: Spatio-Temporal Segmentation guided Classification for Irrigation Mapping
* Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?
* Is It Reliable to Extract Gully Morphology Parameters Based on High-Resolution Stereo Images? A Case of Gully in a Soil-Rock Dual Structure Area
* Is Our Continual Learner Reliable? Investigating Its Decision Attribution Stability through SHAP Value Consistency
* Is Synthetic Data all We Need? Benchmarking the Robustness of Models Trained with Synthetic Images
* Is Vanilla MLP in Neural Radiance Field Enough for Few-Shot View Synthesis?
* IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection
* ISSR-DIL: Image Specific Super-Resolution Using Deep Identity Learning
* It's All About Your Sketch: Democratising Sketch Control in Diffusion Models
* Iterated Learning Improves Compositionality in Large Vision-Language Models
* Iterative Algorithm for Quaternion Eigenvalue Problems in Signal Processing, An
* Iterative Mamba Diffusion Change-Detection Model for Remote Sensing
* Iterative Motion Compensation Algorithm for Synthetic Aperture Passive Positioning, An
* Iterative Optimization-Enhanced Contrastive Learning for Multimodal Change Detection
* iToF-Flow-Based High Frame Rate Depth Imaging
* Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model
* JDEC: JPEG Decoding via Enhanced Continuous Cosine Coefficients
* JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation
* JoAPR: Cleaning the Lens of Prompt Learning for Vision-Language Models
* Joint Energy and Completion Time Difference Minimization for UAV-Enabled Intelligent Transportation Systems: A Constrained Multi-Objective Optimization Approach
* Joint Motion Detection in Neural Videos Training
* Joint Multimodal Transformer for Emotion Recognition in the Wild
* Joint Optimization of Task Offloading and Resource Allocation for UAV-Assisted Edge Computing: A Stackelberg Bilayer Game Approach
* Joint Optimization of UAV Trajectory and Communication Resources With Complete Avoidance of No-Fly-Zones
* Joint Optimization of UAV's Height and Peak Optical Intensity in Weather-Dependent Covert VLC
* Joint Physical-Digital Facial Attack Detection Via Simulating Spoofing Clues
* Joint Reconstruction of 3D Human and Object via Contact-Based Refinement Transformer
* Joint Weakly Supervised Image Emotion Analysis Based on Interclass Discrimination and Intraclass Correlation
* Joint-Task Regularization for Partially Labeled Multi-Task Learning
* Joint2Human: High-quality 3D Human Generation via Compact Spherical Embedding of 3D Joints
* Jointly Training and Pruning CNNs via Learnable Agent Guidance and Alignment
* JointSQ: Joint Sparsification-Quantization for Distributed Learning
* JRDB-PanoTrack: An Open-World Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments
* JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context and Dynamics of Human Interactions Within Social Groups
* Just Add pi! Pose Induced Video Transformers for Understanding Activities of Daily Living
* Kalman-SSM: Modeling Long-Term Time Series With Kalman Filter Structured State Spaces
* Kandinsky Conformal Prediction: Efficient Calibration of Image Segmentation Algorithms
* KD-DETR: Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling
* Kernel Adaptive Convolution for Scene Text Detection via Distance Map Prediction
* Key Patches Are All You Need: A Multiple Instance Learning Framework For Robust Medical Diagnosis
* KeyPoint Relative Position Encoding for Face Recognition
* KI-GAN: Knowledge-Informed Generative Adversarial Networks for Enhanced Multi-Vehicle Trajectory Forecasting at Signalized Intersections
* KITRO: Refining Human Mesh by 2D Clues and Kinematic-tree Rotation
* Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning
* Knowledge Distillation for Efficient Instance Semantic Segmentation with Transformers
* Knowledge-Enhanced Dual-Stream Zero-Shot Composed Image Retrieval
* Koala: Key Frame-Conditioned Long Video-LLM
* Koopman-Based Hybrid Modeling and Zonotopic Tube Robust MPC for Motion Control of Automated Vehicles
* KP-RED: Exploiting Semantic Keypoints for Joint 3D Shape Retrieval and Deformation
* KPConvX: Modernizing Kernel Point Convolution with Kernel Attention
* KTPFormer: Kinematics and Trajectory Prior Knowledge-Enhanced Transformer for 3D Human Pose Estimation
* KVQ: Kwai Video Quality Assessment for Short-form Videos
* L-MAGIC: Language Model Assisted Generation of Images with Coherence
* L0-Sampler: An L0Model Guided Volume Sampling for NeRF
* L2B: Learning to Bootstrap Robust Models for Combating Label Noise
* L4D-Track: Language-to-4D Modeling Towards 6-DoF Tracking and Shape Reconstruction in 3D Point Cloud Stream
* LAA-Net: Localized Artifact Attention Network for Quality-Agnostic and Generalizable Deepfake Detection
* Label Efficient Lifelong Multi-View Broiler Detection
* Label Propagation for Zero-shot Classification with Vision-Language Models
* Label-Efficient Group Robustness via Out-of-Distribution Concept Curation
* Label-free Anomaly Detection in Aerial Agricultural Images with Masked Image Modeling
* Lacunarity Pooling Layers for Plant Image Classification using Texture Analysis
* LaDiffGAN: Training GANs with Diffusion Supervision in Latent Spaces
* LAENeRF: Local Appearance Editing for Neural Radiance Fields
* LAformer: Trajectory Prediction for Autonomous Driving with Lane-Aware Scene Constraints
* LAFS: Landmark-Based Facial Self-Supervised Learning for Face Recognition
* LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented Diffusion
* LAM-Depth: Laplace-Attention Module-Based Self-Supervised Monocular Depth Estimation
* LAMP: Learn A Motion Pattern for Few-Shot Video Generation
* LaMPilot: An Open Benchmark Dataset for Autonomous Driving with Language Model Programs
* LAN: Learning to Adapt Noise for Image Denoising
* Landmark-Based Vehicle Self-Localization Using Automotive Polarimetric Radars
* Landslide Recognition Based on Machine Learning Considering Terrain Feature Fusion
* Landslide Risk Assessments through Multicriteria Analysis
* Landslide Susceptibility Assessment in Yulong County Using Contribution Degree Clustering Method and Stacking Ensemble Coupled Model Based on Certainty Factor
* Lane2Seq: Towards Unified Lane Detection via Sequence Generation
* LaneCPP: Continuous 3D Lane Detection Using Physical Priors
* LangSplat: 3D Language Gaussian Splatting
* Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding
* Language Model Guided Interpretable Video Action Reasoning
* Language Model Personalization for Speech Recognition: A Clustered Federated Learning Approach With Adaptive Weight Average
* Language Models as Black-Box Optimizers for Vision-Language Models
* Language-aware Visual Semantic Distillation for Video Question Answering
* Language-Conditioned Detection Transformer
* Language-driven All-in-one Adverse Weather Removal
* Language-Driven Anchors for Zero-Shot Adversarial Robustness
* Language-driven Grasp Detection
* Language-driven Object Fusion into Neural Radiance Fields with Pose-Conditioned Dataset Updates
* Language-guided Image Reflection Separation
* Language-guided Multi-modal Emotional Mimicry Intensity Estimation
* Language-only Efficient Training of Zero-shot Composed Image Retrieval
* LaPA: Latent Prompt Assist Model for Medical Visual Question Answering
* Laplacian-guided Entropy Model in Neural Codec with Blur-dissipated Synthesis
* LaRE2: Latent Reconstruction Error Based Method for Diffusion-Generated Image Detection
* Large Kernel Frequency-enhanced Network for Efficient Single Image Super-Resolution
* Large Language Models are Good Prompt Learners for Low-Shot Image Classification
* Large Language Models in Wargaming: Methodology, Application, and Robustness
* Large-Scale Bidirectional Training for Zero-Shot Image Captioning
* Large-scale Dataset Pruning with Dynamic Uncertainty
* Large-Scale High-Altitude UAV-Based Vehicle Detection via Pyramid Dual Pooling Attention Path Aggregation Network
* LASA: Instance Reconstruction from Real Scans using A Large-scale Aligned Shape Annotation Dataset
* LASIL: Learner-Aware Supervised Imitation Learning For Long-Term Microscopic Traffic Simulation
* LASO: Language-Guided Affordance Segmentation on 3D Object
* Latency Correction for Event-Guided Deblurring and Frame Interpolation
* Latent Flow Diffusion for Deepfake Video Generation
* Latent Modulated Function for Computational Optimal Continuous Image Representation
* Latent-based Diffusion Model for Long-tailed Recognition
* LatentMan: Generating Consistent Animated Characters using Image Diffusion Models
* Layered Modeling of Affective, Perception, and Visual Properties: Optimizing Structure With Genetic Algorithm
* Layered Semantic Communication System for Dynamic Scenarios
* Layout-Agnostic Scene Text Image Synthesis with Diffusion Models
* LayoutFormer: Hierarchical Text Detection Towards Scene Text Understanding
* LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding
* LD-Pruner: Efficient Pruning of Latent Diffusion Models using Task-Agnostic Insights
* LDP: Language-driven Dual-Pixel Image Defocus Deblurring Network
* LDPC Code-Based Distributed Source Coding With an Efficient Message Passing Mechanism for the Compression of Correlated Image Sources
* LEAD: Exploring Logit Space Evolution for Model Selection
* LEAD: Learning Decomposition for Source-free Universal Domain Adaptation
* Leak and Learn: An Attacker's Cookbook to Train Using Leaked Data from Federated Learning
* LEAP-VO: Long-term Effective Any Point Tracking for Visual Odometry
* Learn from View Correlation: An Anchor Enhancement Strategy for Multi-View Clustering
* Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation
* Learnable Earth Parser: Discovering 3D Prototypes in Aerial Scans
* Learnable Global Spatio-Temporal Adaptive Aggregation for Bracketing Image Restoration and Enhancement
* Learnable Prompt for Few-Shot Semantic Segmentation in Remote Sensing Domain
* Learned Lossless Image Compression Based on Bit Plane Slicing
* Learned Representation-Guided Diffusion Models for Large-Image Generation
* Learned Scanpaths Aid Blind Panoramic Video Quality Assessment
* Learned Trajectory Embedding for Subspace Clustering
* Learning a Non-Locally Regularized Convolutional Sparse Representation for Joint Chromatic and Polarimetric Demosaicking
* Learning Adaptive Spatial Coherent Correlations for Speech-Preserving Facial Expression Manipulation
* Learning Background Prompts to Discover Implicit Knowledge for Open Vocabulary Object Detection
* Learning by Correction: Efficient Tuning Task for Zero-Shot Generative Vision-Language Reasoning
* Learning CNN on ViT: A Hybrid Model to Explicitly Class-Specific Boundaries for Domain Adaptation
* Learning Continual Compatible Representation for Re-indexing Free Lifelong Person Re-identification
* Learning Continuous 3D Words for Text-to-Image Generation
* Learning Correlation Structures for Vision Transformers
* Learning Coupled Dictionaries from Unpaired Data for Image Super-Resolution
* Learning Degradation-Independent Representations for Camera ISP Pipelines
* Learning Degradation-Unaware Representation with Prior-Based Latent Transformations for Blind Face Restoration
* Learning Diffusion Texture Priors for Image Restoration
* Learning Discriminative Dynamics with Label Corruption for Noisy Label Detection
* Learning Discriminative Features via Multi-Hierarchical Mutual Information for Unsupervised Point Cloud Registration
* Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation
* Learning Driver-Irrelevant Features for Generalizable Driver Behavior Recognition
* Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis
* Learning Equi-Angular Representations for Online Continual Learning
* Learning for Transductive Threshold Calibration in Open-World Recognition
* Learning from Observer Gaze: Zero-Shot Attention Prediction Oriented by Human-Object Interaction Recognition
* Learning from One Continuous Video Stream
* Learning from Synthetic Human Group Activities
* Learning Geometric Information via Transformer Network for Key-Points Based Motion Segmentation
* Learning Group Activity Features Through Person Attribute Prediction
* Learning Inclusion Matching for Animation Paint Bucket Colorization
* Learning Instance-Aware Correspondences for Robust Multi-Instance Point Cloud Registration in Cluttered Scenes
* Learning Intra-View and Cross-View Geometric Knowledge for Stereo Matching
* Learning Large-Factor EM Image Super-Resolution with Generative Priors
* Learning Multi-Dimensional Human Preference for Text-to-Image Generation
* Learning Object State Changes in Videos: An Open-World Perspective
* Learning Occupancy for Monocular 3D Object Detection
* Learning Optimized Low-Light Image Enhancement for Edge Vision Tasks
* Learning SO(3)-Invariant Semantic Correspondence via Local Shape Transform
* Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution
* Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos
* Learning Structure-From-Motion with Graph Attention Networks
* Learning Surface Terrain Classifications from Ground Penetrating Radar
* Learning the 3D Fauna of the Web
* Learning to Classify New Foods Incrementally Via Compressed Exemplars
* Learning to Control Camera Exposure via Reinforcement Learning
* Learning to Count Without Annotations
* Learning to Generate Parameters of ConvNets for Unseen Image Data
* Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs
* Learning to Navigate Efficiently and Precisely in Real Environments
* Learning to Predict Activity Progress by Self-Supervised Video Alignment
* Learning to Produce Semi-Dense Correspondences for Visual Localization
* Learning to Rank Patches for Unbiased Image Redundancy Reduction
* Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval
* Learning to Remove Wrinkled Transparent Film with Polarized Prior
* Learning to Schedule Resistant to Adversarial Attacks in Diffusion Probabilistic Models Under the Threat of Lipschitz Singularities
* Learning to Segment Referred Objects from Narrated Egocentric Videos
* Learning to Select Views for Efficient Multi-View Understanding
* Learning to Sketch: A Neural Approach to Item Frequency Estimation in Streaming Data
* Learning to Transform Dynamically for Better Adversarial Transferability
* Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge
* Learning Tracking Representations from Single Point Annotations
* Learning Transferable Compound Expressions from Masked AutoEncoder Pretraining
* Learning Transferable Conceptual Prototypes for Interpretable Unsupervised Domain Adaptation
* Learning Transferable Negative Prompts for Out-of-Distribution Detection
* Learning Triangular Distribution in Visual World
* Learning Vision from Models Rivals Learning Vision from Data
* Learning Visual Prompt for Gait Recognition
* Learning with Structural Labels for Learning with Noisy Labels
* Learning With Style: Continual Semantic Segmentation Across Tasks and Domains
* Learning with Unreliability: Fast Few-Shot Voxel Radiance Fields with Relative Geometric Consistency
* Learning without Exact Guidance: Updating Large-Scale High-Resolution Land Cover Maps from Low-Resolution Historical Labels
* LED: A Large-scale Real-world Paired Dataset for Event Camera Denoising
* LEDITS++: Limitless Image Editing Using Text-to-Image Models
* LeftRefill: Filling Right Canvas based on Left Reference through Generalized Text-to-Image Diffusion Model
* LeGO: Leveraging a Surface Deformation Network for Animatable Stylized Face Generation with One Example
* LEMON: Learning 3D Human-Object Interaction Relation from 2D Images
* LENAS: Learning-Based Neural Architecture Search and Ensemble for 3-D Radiotherapy Dose Prediction
* LEOD: Label-Efficient Object Detection for Event Cameras
* Let me show you how it's done: Cross-modal knowledge distillation as pretext task for semantic segmentation
* Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation
* Leveraging Camera Triplets for Efficient and Accurate Structure-from-Motion
* Leveraging Cross-Modal Neighbor Representation for Improved CLIP Classification
* Leveraging Frame Affinity for sRGB-to-RAWVideo De-Rendering
* Leveraging Generative Language Models for Weakly Supervised Sentence Component Analysis in Video-Language Joint Learning
* Leveraging Large Language Models for Multimodal Search
* Leveraging Machine Learning and Remote Sensing for Water Quality Analysis in Lake Ranco, Southern Chile
* Leveraging Pre-trained Multi-task Deep Models for Trustworthy Facial Analysis in Affective Behaviour Analysis in-the-Wild
* Leveraging Predicate and Triplet Learning for Scene Graph Generation
* Leveraging Vision-Language Models for Improving Domain Generalization in Image Classification
* LGAfford-Net: A Local Geometry Aware Affordance Detection Network for 3D Point Clouds
* LGFN: Lightweight Light Field Image Super-Resolution using Local Convolution Modulation and Global Attention Feature Extraction
* LGTrack: Exploiting Local and Global Properties for Robust Visual Tracking
* LiDAR-Based Person Re-Identification
* LiDAR-Net: A Real-Scanned 3D Point Cloud Dataset for Indoor Scenes
* Lidar-Observed Diel Vertical Variations of Inland Chlorophyll a Concentration
* LiDAR4D: Dynamic Neural Fields for Novel Space-Time View LiDAR Synthesis
* LidaRF: Delving into Lidar for Neural Radiance Field on Street Scenes
* Lift-Attend-Splat: Bird's-eye-view camera-lidar fusion using transformers
* Lift3D: Zero-Shot Lifting of Any 2D Vision Model to 3D
* Lifting Multi-View Detection and Tracking to the Bird's Eye View
* Light Fields Stitching for Windowed-6DoF VR Content
* Light the Night: A Multi-Condition Diffusion Framework for Unpaired Low-Light Enhancement in Autonomous Driving
* LightIt: Illumination Modeling and Control for Diffusion Models
* LightOctree: Lightweight 3D Spatially-Coherent Indoor Lighting Estimation
* lightweight attention-driven distillation model for human pose estimation, A
* Lightweight Maize Disease Detection through Post-Training Quantization with Similarity Preservation
* Lightweight Multitask Learning for Robust JND Prediction Using Latent Space and Reconstructed Frames
* Lightweight Spatiotemporal Network for Online Eye Tracking with Event Camera, A
* Linguistic-Aware Patch Slimming Framework for Fine-Grained Cross-Modal Alignment
* Link Aggregation for Skip Connection-Mamba: Remote Sensing Image Segmentation Network Based on Link Aggregation Mamba
* Link between Surface Visible Light Spectral Features and Water-Salt Transfer in Saline Soils: Investigation Based on Soil Column Laboratory Experiments, The
* Link-Context Learning for Multimodal LLMs
* Linkage-Based Object Re-Identification via Graph Learning
* LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
* LiOSR-SAR: Lightweight Open-Set Recognizer for SAR Imageries
* LiSA: LiDAR Localization with Semantic Awareness
* LISA: Reasoning Segmentation via Large Language Model
* Listen Then See: Video Alignment with Speaker Attention
* LiveHPS: LiDAR-Based Scene-Level Human Pose and Shape Estimation in Free Environment
* Livestock Detection and Counting in Kenyan Rangelands Using Aerial Imagery and Deep Learning Techniques
* Living Scenes: Multi-object Relocalization and Reconstruction in Changing 3D Environments
* LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning
* LLaFS: When Large Language Models Meet Few-Shot Segmentation
* LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction
* LLM-Guided Cross-Modal Point Cloud Quality Assessment: A Graph Learning Approach
* LLM-Seg: Bridging Image Segmentation and Large Language Model Reasoning
* LLM4SGG: Large Language Models for Weakly Supervised Scene Graph Generation
* LLMs are Good Action Recognizers
* LLMs are Good Sign Language Translators
* LMDrive: Closed-Loop End-to-End Driving with Large Language Models
* Local Extremum Constrained Total Variation Model for Natural and Hyperspectral Image Non-Blind Deblurring
* Local Weather and Global Climate Data-Driven Long-Term Runoff Forecasting Based on Local-Global-Temporal Attention Mechanisms and Graph Attention Networks
* Local-consistent Transformation Learning for Rotation-invariant Point Cloud Analysis
* Localised-NeRF: Specular Highlights and Colour Gradient Localising in NeRF
* Localization is All You Evaluate: Data Leakage in Online Mapping Datasets and How to Fix it
* Localized Linear Temporal Dynamics for Self-Supervised Skeleton Action Recognition
* Locally Adaptive Neural 3D Morphable Models
* LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language Model
* LoCoNet: Long-Short Context Network for Active Speaker Detection
* Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation Guided by the Characteristic Dance Primitives
* LOFI: LOng-tailed FIne-Grained Network for Food Recognition
* Logarithmic Lenses: Exploring Log RGB Data for Image Classification
* LogicAL: Towards logical anomaly synthesis for unsupervised anomaly localization
* Logit Standardization in Knowledge Distillation
* Long-Tail Class Incremental Learning via Independent SUb-Prototype Construction
* Long-Tailed Anomaly Detection with Learnable Class Names
* Long-Term Energy Consumption Minimization in NOMA-Enabled Vehicular Edge Computing Networks
* Look, Listen, and Attack: Backdoor Attacks Against Video Action Recognition
* Look-Up Table Compression for Efficient Image Restoration
* Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation
* Looking 3D: Anomaly Detection with 2D-3D Alignment
* Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
* Loopy-SLAM: Dense Neural SLAM with Loop Closures
* Loose Inertial Poser: Motion Capture with IMU-attached Loose-Wear Jacket
* LORS: Low-Rank Residual Structure for Parameter-Efficient Network Stacking
* LoS: Local Structure-Guided Stereo Matching
* LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation
* Lost in Translation: Lip-Sync Deepfake Detection from Audio-Video Mismatch
* Lotus: Evasive and Resilient Backdoor Attacks through Sub-Partitioning
* Low Latency Point Cloud Rendering with Learned Splatting
* Low-Cost and Lightweight Real-Time Object-Detection Method Based on UAV Remote Sensing in Transportation Systems, A
* Low-Latency Neural Stereo Streaming
* Low-Light Image Enhancement Framework for Improved Object Detection in Fisheye Lens Datasets
* Low-power, Continuous Remote Behavioral Localization with Event Cameras
* Low-Rank Approximation for Sparse Attention in Multi-Modal LLMs
* Low-Rank Few-Shot Adaptation of Vision-Language Models
* Low-Rank Knowledge Decomposition for Medical Foundation Models
* Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach
* Low-Rank Tensor and Hybrid Smoothness Regularization-Based Approach for Traffic Data Imputation With Multimodal Missing
* Low-Res Leads the Way: Improving Generalization for Super-Resolution by Self-Supervised Learning
* Low-Resolution-Only Microscopy Super-Resolution Models Generalizing to Non-Periodicities at Atomic Scale
* Low-Resource Vision Challenges for Foundation Models
* LowRankOcc: Tensor Decomposition and Low-Rank Recovery for Vision-Based 3D Semantic Occupancy Prediction
* LP++: A Surprisingly Strong Linear Probe for Few-Shot CLIP
* LPSNet: End-to-End Human Pose and Shape Estimation with Lensless Imaging
* LQMFormer: Language-Aware Query Mask Transformer for Referring Image Segmentation
* LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels
* LTA-PCS: Learnable Task-Agnostic Point Cloud Sampling
* LTGC: Long-Tail Recognition via Leveraging LLMs-Driven Generated Content
* LTM: Lightweight Textured Mesh Extraction and Refinement of Large Unbounded Scenes for Efficient Storage and Real-Time Rendering
* LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching
* Lunar Exploration Based on Ground-Based Radar: Current Research Progress and Future Prospects
* LUWA Dataset: Learning Lithic Use-Wear Analysis on Microscopic Images
* LVS: A Learned Video Storage for Fast and Efficient Video Understanding
* M&M VTO: Multi-Garment Virtual Try-On and Editing
* M-RRFS: A Memory-Based Robust Region Feature Synthesizer for Zero-Shot Object Detection
* M3-UDA: A New Benchmark for Unsupervised Domain Adaptive Fetal Cardiac Structure Detection
* MA-AVT: Modality Alignment for Parameter-Efficient Audio-Visual Transformers
* MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
* Ma2SP: Missing-Aware Prompting With Modality-Adaptive Integration for Incomplete Multi-Modal Survival Prediction
* MAC: Masked Contrastive Pre-Training for Efficient Video-Text Retrieval
* MACE: Mass Concept Erasure in Diffusion Models
* Machine Learning Modelling for Soil Moisture Retrieval from Simulated NASA-ISRO SAR (NISAR) L-Band Data
* MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer
* MAFA: Managing False Negatives for Vision-Language Pre-Training
* MAFNet: Multimodal Asymmetric Fusion Network for Radar Echo Extrapolation
* MaGGIe: Masked Guided Gradual Human Instance Matting
* Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification
* MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
* MAGICK: A Large-Scale Captioned Dataset from Matting Generated Images Using Chroma Keying
* Make Me a BNN: A Simple Strategy for Estimating Bayesian Uncertainty from Pre-trained Models
* Make Pixels Dance: High-Dynamic Video Generation
* Make-It-Vivid: Dressing Your Animatable Biped Cartoon Characters from Text
* Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework
* Makeup Prior Models for 3D Facial Makeup Estimation and Applications
* Making use of unlabeled data: Comparing strategies for marine animal detection in long-tailed datasets using self-supervised and semi-supervised pre-training
* Making Vision Transformers Truly Shift-Equivariant
* Making Visual Sense of Oracle Bones for You and Me
* MambaPupil: Bidirectional Selective Recurrent model for Event-based Eye tracking
* Manga Whisperer: Automatically Generating Transcriptions for Comics, The
* ManiCLIP: Multi-attribute Face Manipulation from Text
* Manifold DivideMix: A Semi-Supervised Contrastive Learning Framework for Severe Label Noise
* Manifold-Based Incomplete Multi-View Clustering via Bi-Consistency Guidance
* ManiFPT: Defining and Analyzing Fingerprints of Generative Models
* ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation
* MANUS: Markerless Grasp Capture Using Articulated 3D Gaussians
* Map-Relative Pose Regression for Visual Re-Localization
* MAP: MAsk-Pruning for Source-Free Model Intellectual Property Protection
* MapGen-Diff: An End-to-End Remote Sensing Image to Map Generator via Denoising Diffusion Bridge Model
* MAPLM: A Real-World Large-Scale Vision-Language Benchmark for Map and Traffic Scene Understanding
* Mapping Building Heights at Large Scales Using Sentinel-1 Radar Imagery and Nighttime Light Data
* Mapping Earth Hummocks in Daisetsuzan National Park in Japan Using UAV-SfM Framework
* Mapping Field-Level Maize Yields in Ethiopian Smallholder Systems Using Sentinel-2 Imagery
* Mapping Fruit-Tree Plantation Using Sentinel-1/2 Time Series Images with Multi-Index Entropy Weighting Dynamic Time Warping Method
* Mapping Geospatial AI Flood Risk in National Road Networks
* Mapping Localization Preferences for Residential Buildings
* Mapping Natural Populus euphratica Forests in the Mainstream of the Tarim River Using Spaceborne Imagery and Google Earth Engine
* Mapping the Influence of Olympic Games' Urban Planning on the Land Surface Temperatures: An Estimation Using Landsat Series and Google Earth Engine
* Mapping the Spatial and Seasonal Details of Heat Health Risks in Different Local Climate Zones: A Case Study of Shanghai, China
* MAPSeg: Unified Unsupervised Domain Adaptation for Heterogeneous Medical Image Segmentation Based on 3D Masked Autoencoding and Pseudo-Labeling
* Marine Radar Constant False Alarm Rate Detection in Generalized Extreme Value Distribution Based on Space-Time Adaptive Filtering Clutter Statistical Analysis
* MarkovGen: Structured Prediction for Efficient Text-to-Image Generation
* MART: Masked Affective RepresenTation Learning via Masked Temporal Distribution Distillation
* MAS: Multi-view Ancestral Sampling for 3D Motion Generation Using 2D Diffusion
* Mask Grounding for Referring Image Segmentation
* Mask4Align: Aligned Entity Prompting with Color Masks for Multi-Entity Localization Problems
* MaskCLR: Attention-Guided Contrastive Learning for Robust Action Representation Learning
* MaskClustering: View Consensus Based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation
* Masked and Shuffled Blind Spot Denoising for Real-World Images
* Masked AutoDecoder is Effective Multi-Task Vision Generalist
* Masked Autoencoders are Secretly Efficient Learners
* Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology
* Masked Spatial Propagation Network for Sparsity-Adaptive Depth Refinement
* MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers
* MaskPLAN: Masked Generative Layout Planning from Partial Input
* MaskSim: Detection of synthetic images by masked spectrum similarity analysis
* Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences
* Matching Anything by Segmenting Anything
* MatchU: Matching Unseen Objects for 6D Pose Estimation from RGB-D Images
* Material Palette: Extraction of Materials from a Single Image
* MatFuse: Controllable Material Generation with Diffusion Models
* Matrix Embedding Based Multiple Histograms Modification for Efficient Reversible Data Hiding
* MatSynth: A Modern PBR Materials Dataset
* Matting Anything
* Maximum Entropy Attack on Decision Fusion With Herding Behaviors
* MaxQ: Multi-Axis Query for N:m Sparsity Network
* MCD: Diverse Large-Scale Multi-Campus Dataset for Robot Perception
* MCFPN: Multi-Path Cross Fusion Pyramid-Like Network for Image Super-Resolution Reconstruction
* MCNet: Rethinking the Core Ingredients for Accurate and Efficient Homography Estimation
* MCPNet: An Interpretable Classifier via Multi-Level Concept Prototypes
* MDD-ShipNet: Math-Data Integrated Defogging for Fog-Occlusion Ship Detection
* MDFA-Net: Multi-Scale Differential Feature Self-Attention Network for Building Change Detection in Remote Sensing Images
* MDSCNN: Remote Sensing Image Spatial-Spectral Fusion Method via Multi-Scale Dual-Stream Convolutional Neural Network
* MeaCap: Memory-Augmented Zero-shot Image Captioning
* Mean-Shift Feature Transformer
* Measuring Biophysical Parameters of Wheat Canopy with MHz- and GHz-Frequency Range Impulses Employing Contactless GPR
* MECFNet: Reconstruct Sharp Image for UAV-Based Crack Detection
* MedBN: Robust Test-Time Adaptation against Malicious Test Samples
* Medical Image Segmentation with InTEnt: Integrated Entropy Weighting for Single Image Test-Time Adaptation
* Medium Scale Benchmark for Cricket Excited Actions Understanding
* MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant
* MEFSR-GAN: A Multi-Exposure Feedback and Super-Resolution Multitask Network via Generative Adversarial Networks
* MELFuSION: Synthesizing Music from Image and Language Cues Using Diffusion Models
* Melt Pond Evolution along the MOSAiC Drift: Insights from Remote Sensing and Modeling
* MemFlow: Optical Flow Estimation and Prediction with Memory
* MemoNav: Working Memory Model for Visual Navigation
* Memory-based Adapters for Online 3D Scene Perception
* Memory-Scalable and Simplified Functional Map Learning
* MemSAM: Taming Segment Anything Model for Echocardiography Video Segmentation
* MESA: Matching Everything by Segmenting Anything
* MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers
* MeshPose: Unifying DensePose and 3D Body Mesh reconstruction
* Meta-learning Approaches for Few-Shot Learning: A Survey of Recent Advances
* Meta-learning from learning curves for budget-limited algorithm selection
* Meta-Point Learning and Refining for Category-Agnostic Pose Estimation
* MetaCloak: Preventing Unauthorized Subject-Driven Text-to-Image Diffusion-Based Synthesis via Meta-Learning
* Metaverse Meets Intelligent Transportation System: An Efficient and Instructional Visual Perception Framework
* Method of Moments Embedding Constraint and its Application to Semi-Supervised Learning, A
* Methods for Assessing the Effectiveness of Modern Counter Unmanned Aircraft Systems
* MFH-Net: A Hybrid CNN-Transformer Network Based Multi-Scale Fusion for Medical Image Segmentation
* MFP: Making Full Use of Probability Maps for Interactive Image Segmentation
* MG-GCT: A Motion-Guided Graph Convolutional Transformer for Traffic Gesture Recognition
* MGINS: A Lane-Level Localization System for Challenging Urban Environments Using Magnetic Field Matching/GNSS/INS Fusion
* MGMap: Mask-Guided Learning for Online Vectorized HD Map Construction
* MICap: A Unified Model for Identity-Aware Movie Descriptions
* micro Reinforcement Learning architecture for Intrusion Detection Systems, A
* MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
* MicroDiffusion: Implicit Representation-Guided Diffusion for 3D Reconstruction from Limited 2D Microscopy Projections
* Microphysical Characteristics of Monsoon Precipitation over Yangtze-and-Huai River Basin and South China: A Comparative Study from GPM DPR Observation
* MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis
* MiKASA: Multi-Key-Anchor & Scene-Aware Transformer for 3D Visual Grounding
* MIMIC: Masked Image Modeling with Image Correspondences
* MimicDiffusion: Purifying Adversarial Perturbation via Mimicking Clean Diffusion Model
* Mind Artist: Creating Artistic Snapshots with Human Thought
* Mind marginal non-crack regions: Clustering-inspired representation learning for crack segmentation
* Mind The Edge: Refining Depth Edges in Sparsely-Supervised Monocular Depth Estimation
* MindBridge: A Cross-Subject Brain Decoding Framework
* Mini-Satellite Fucheng 1 SAR: Interferometry to Monitor Mining-Induced Subsidence and Comparative Analysis with Sentinel-1
* Minimal Perspective Autocalibration
* Mining Supervision for Dynamic Regions in Self-Supervised Monocular Depth Estimation
* Mip-Splatting: Alias-Free 3D Gaussian Splatting
* MIPI 2024 Challenge on Demosaic for Hybridevs Camera: Methods and Results
* MIPI 2024 Challenge on Few-shot RAW Image Denoising: Methods and Results
* MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results
* MirageRoom: 3D Scene Segmentation with 2D Pre-Trained Models by Mirage Projection
* Mirasol3B: A Multimodal Autoregressive Model for Time-Aligned and Contextual Modalities
* Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes, The
* Misalignment-Robust Frequency Distribution Loss for Image Transformation
* Mission Planning and Trajectory Optimization in UAV Swarm for Track Deception against Radar Network
* Mitigating Bias Using Model-Agnostic Data Attribution
* Mitigating Challenges of the Space Environment for Onboard Artificial Intelligence: Design Overview of the Imaging Payload on SpIRIT
* Mitigating Disparate Elevation Differences between Adjacent Topobathymetric Data Models Using Binary Code
* Mitigating Motion Blur in Neural Radiance Fields with Events and Frames
* Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning
* Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning Through Object Exchange
* Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
* Mixed-Precision Quantization for Federated Learning on Resource-Constrained Heterogeneous Devices
* MixStyle-Based Contrastive Test-Time Adaptation: Pathway to Domain Generalization
* MixSyn: Compositional Image Synthesis with Fuzzy Masks and Style Fusion
* MLIP: Enhancing Medical Visual Representation with Divergence Encoder and Knowledge-guided Contrastive Learning
* MLP Can Be a Good Transformer Learner
* MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning
* MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild
* MMA-Diffusion: MultiModal Attack on Diffusion Models
* MMA: Multi-Modal Adapter for Vision-Language Models
* MMCert: Provable Defense Against Adversarial Attacks to Multi-Modal Models
* MMIST-ccRCC: A Real World Medical Dataset for the Development of Multi-Modal Systems
* MMM: Generative Masked Motion Model
* MMMU: A Massive Multi-Discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
* MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos
* MMVP: A Multimodal MoCap Dataset with Vision and Pressure Sensors
* mmWave Sensor and Camera Fusion System for Indoor Occupancy Detection and Tracking, A
* Mobile Aware Denoiser Network (MADNet) for Quad Bayer Images
* Mobile Devices in Forest Mensuration: A Review of Technologies and Methods in Single Tree Measurements
* MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
* Mocap Everyone Everywhere: Lightweight Motion Capture with Smartwatches and a Head-Mounted Camera
* MoCap-to-Visual Domain Adaptation for Efficient Human Mesh Estimation from 2D Keypoints
* MoCha-Stereo: Motif Channel Attention Network for Stereo Matching
* MoDA: Leveraging Motion Priors from Videos for Advancing Unsupervised Domain Adaptation in Semantic Segmentation
* Modal Excitation in Feedback Delay Networks
* Modality-Agnostic Domain Generalizable Medical Image Segmentation by Multi-Frequency in Multi-Scale Attention
* Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration
* Modality-Aware Heterogeneous Graph for Joint Video Moment Retrieval and Highlight Detection
* Modality-Collaborative Test-Time Adaptation for Action Recognition
* ModaVerse: Efficiently Transforming Modalities with LLMs
* MoDE: CLIP Data Experts via Clustering
* Model Adaptation for Time Constrained Embodied Control
* Model Inversion Robustness: Can Transfer Learning Help?
* Model-Based Research on Performance Evaluation and Topology Optimization of Series-Parallel Lithium-Ion Battery Packs, A
* Model-guided contrastive fine-tuning for industrial anomaly detection
* Modeling and Robust H_inf Control Synthesis of the CAV-HDV Heterogeneous Traffic System With Different Car-Following Modes
* Modeling Collaborator: Enabling Subjective Vision Classification with Minimal Human Effort via LLM Tool-Use
* Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction
* Modeling Detailed Human Geometry with Adaptive Local Refinement
* Modeling Hierarchical Structural Distance for Unsupervised Domain Adaptation
* Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations
* Modeling of Multiple Spatial-Temporal Relations for Robust Visual Object Tracking
* Modified Hybrid Integration Algorithm for Moving Weak Target in Dual-Function Radar and Communication System
* Modular Blind Video Quality Assessment
* Modular control architecture for safe marine navigation: Reinforcement learning with predictive safety filters
* MoE-AGIQA: Mixture-of-Experts Boosted Visual Perception-Driven and Semantic-Aware Quality Assessment for AI-Generated Images
* MOHO: Learning Single-View Hand-Held Object Reconstruction with Multi-View Occlusion-Aware Supervision
* Molecular Data Programming: Towards Molecule Pseudo-labeling with Systematic Weak Supervision
* MoMask: Generative Masked Modeling of 3D Human Motions
* MoML: Online Meta Adaptation for 3D Human Motion Prediction
* Monitoring Outdoor Parking in Urban Areas With Unmanned Aerial Vehicles
* Monitoring Social Insect Activity with Minimal Human Supervision
* Monitoring Temporal Sandbar and Shoreline Changes at Saint Louis, Senegal: Using Sentinel-2 Imagery (2015-2022)
* Monkey: Image Resolution and Text Label are Important Things for Large Multi-Modal Models
* MonoCD: Monocular 3D Object Detection with Complementary Depths
* Monocular 6-DoF Pose Estimation of Spacecrafts Utilizing Self-iterative Optimization and Motion Consistency
* Monocular Identity-Conditioned Facial Reflectance Reconstruction
* MonoDiff: Monocular 3D Object Detection and Pose Estimation with Diffusion Models
* MonoHair: High-Fidelity Hair Modeling from a Monocular Video
* MonoNPHM: Dynamic Head Reconstruction from Monocular Videos
* MonoSelfRecon: Purely Self-Supervised Explicit Generalizable 3D Reconstruction of Indoor Scenes from Monocular RGB Views
* Monte Carlo-Based Restoration of Images Degraded by Atmospheric Turbulence
* MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-Wise Pruning Error Metric
* More You See in 2D, the More You Perceive in 3D, The
* MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
* Morphable Diffusion: 3D-Consistent Diffusion for Single-image Avatar Creation
* MorpheuS: Neural Dynamic 360° Surface Reconstruction from Monocular RGB-D Video
* Morphological Features of Severe Ionospheric Weather Associated with Typhoon Doksuri in 2023
* Morphological Prototyping for Unsupervised Slide Representation Learning in Computational Pathology
* Mosaic-SDF for 3D Generative Models
* MoSAR: Monocular Semi-Supervised Model for Avatar Reconstruction using Differentiable Shading
* MoST: Motion Style Transformer Between Diverse Action Contents
* MoST: Multi-modality Scene Tokenization for Motion Prediction
* Motion Blur Decomposition with Cross-shutter Guidance
* Motion Diversification Networks
* Motion-Adaptive Separable Collaborative Filters for Blind Motion Deblurring
* Motion-aware Needle Segmentation in Ultrasound Images
* Motion2VecSets: 4D Latent Vector Set Diffusion for Non-Rigid Shape Reconstruction and Tracking
* MotionEditor: Editing Video Motion via Content-Aware Diffusion
* Motorcyclist Helmet Violation Detection Framework by Leveraging Robust Ensemble and Augmentation Methods
* Move Anything with Layered Scene Diffusion
* Move as you Say, Interact as you can: Language-Guided Human Motion Generation with Scene Affordance
* MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
* MP-PolarMask: A Faster and Finer Instance Segmentation for Concave Images
* MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception
* mPLUG-OwI2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
* MPOD123: One Image to 3D Content Generation Using Mask-Enhanced Progressive Outline-to-Detail Optimization
* MR-VNet: Media Restoration using Volterra Networks
* MRC-Net: 6-DoF Pose Estimation with MultiScale Residual Correlation
* MRFP: Learning Generalizable Semantic Segmentation from Sim-2-Real with Multi-Resolution Feature Perturbation
* MRFS: Mutually Reinforcing Image Fusion and Segmentation
* MS-DETR: Efficient DETR Training with Mixed Supervision
* MS-MANO: Enabling Hand Pose Tracking with Biomechanical Constraints
* MSU-4S: The Michigan State University Four Seasons Dataset
* MTLoRA: A Low-Rank Adaptation Approach for Efficient Multi-Task Learning
* MTMMC: A Large-Scale Real-World Multi-Modal Camera Tracking Benchmark
* Mudslide: A Universal Nuclear Instance Segmentation Method
* MuGE: Multiple Granularity Edge Detection
* MuJo-SF: Multimodal Joint Slot Filling for Attribute Value Prediction of E-Commerce Commodities
* MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation
* MULDE: Multiscale Log-Density Estimation via Denoising Score Matching for Video Anomaly Detection
* MULi-Ev: Maintaining Unperturbed LiDAR-Event Calibration
* Multi Criteria Methodology for Aircraft Trajectory Planning Algorithm Selection: A Survey
* Multi Model Ensemble for Compound Expression Recognition
* Multi-Agent Collaborative Perception via Motion-Aware Robust Communication Network
* Multi-Agent Long-Term 3D Human Pose Forecasting via Interaction-Aware Trajectory Conditioning
* Multi-angle Consistent Generative NeRF with Additive Angular Margin Momentum Contrastive Learning
* Multi-Attribute Interactions Matter for 3D Visual Grounding
* Multi-bit, Black-box Watermarking of Deep Neural Networks in Embedded Applications
* Multi-Criteria Token Fusion with One-Step-Ahead Attention for Efficient Vision Transformers
* Multi-Explainable TemporalNet: An Interpretable Multimodal Approach using Temporal Convolutional Network for User-level Depression Detection
* Multi-Level Feature Fusion Network for Lightweight Stereo Image Super-Resolution
* Multi-Level Information Fusion Network With Edge Information Injection for Single-Band Cloud Detection
* Multi-Level Neural Scene Graphs for Dynamic Urban Environments
* Multi-Level Pixel-Wise Correspondence Learning for 6DoF Face Pose Estimation
* Multi-modal Aerial View Image Challenge: SAR Classification
* Multi-modal Aerial View Image Challenge: Sensor Domain Translation
* Multi-modal Arousal and Valence Estimation under Noisy Conditions
* Multi-Modal Fusion of Event and RGB for Monocular Depth Estimation Using a Unified Transformer-based Architecture
* Multi-Modal Hallucination Control by Visual Information Grounding
* Multi-Modal Hit Detection and Positional Analysis in Padel Competitions
* Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer
* Multi-Modal Instruction Tuned LLMs with Fine-Grained Visual Perception
* Multi-Modal Learning for Geospatial Vegetation Forecasting
* Multi-Modal Proxy Learning Towards Personalized Visual Multiple Clustering
* Multi-Object Tracking in the Dark
* Multi-Objective Hardware Aware Neural Architecture Search using Hardware Cost Diversity
* Multi-perspective Traffic Video Description Model with Fine-grained Refinement Approach
* Multi-resolution Rescored ByteTrack for Video Object Detection on Ultra-low-power Embedded Systems
* Multi-Scale 3D Gaussian Splatting for Anti-Aliased Rendering
* Multi-scale Attention Network for Single Image Super-Resolution
* Multi-scale Attention-Based Inclination Angles Estimation for Panoramic Camera
* Multi-Scale Content-Structure Feature Extraction Network Applied to Gully Extraction, A
* Multi-Scale Context Fusion Network for Urban Solid Waste Detection in Remote Sensing Images
* Multi-Scale Contrastive Learning for Human Pose Estimation
* Multi-Scale Dynamic and Hierarchical Relationship Modeling for Facial Action Units Recognition
* Multi-Scale Explicit Matching and Mutual Subject Teacher Learning for Generalizable Person Re-Identification
* Multi-Scale Feature Fusion Based Lightweight Vehicle Target Detection Network on Aerial Optical Images, A
* Multi-Scale Feature Fusion using Channel Transformers for Guided Thermal Image Super Resolution
* Multi-Scale Learnable Gabor Transform for Pedestrian Trajectory Prediction From Different Perspectives
* Multi-scale occlusion suppression network for occluded person re-identification
* Multi-Scale Semantic Map Distillation for Lightweight Pavement Crack Detection
* Multi-Scale Spatio-Temporal Memory Network for Lightweight Video Denoising
* Multi-Scale Video Anomaly Detection by Multi-Grained Spatio-Temporal Representation Learning
* Multi-Session SLAM with Differentiable Wide-Baseline Pose Optimization
* Multi-Source Data-Driven Extraction of Urban Residential Space: A Case Study of the Guangdong-Hong Kong-Macao Greater Bay Area Urban Agglomeration
* Multi-Space Alignments Towards Universal LiDAR Segmentation
* Multi-Stream Cellular Test-Time Adaptation of Real-Time Models Evolving in Dynamic Environments
* Multi-Task Dense Prediction via Mixture of Low-Rank Experts
* Multi-Task Multi-Modal Self-Supervised Learning for Facial Expression Recognition
* Multi-Temporal Pixel-Based Compositing for Cloud Removal Based on Cloud Masks Developed Using Classification Techniques
* Multi-Track Timeline Control for Text-Driven 3D Human Motion Generation
* Multi-View Action Recognition for Distracted Driver Behavior Localization
* Multi-View Aggregation Network for Dichotomous Image Segmentation
* Multi-View Attentive Contextualization for Multi-View 3D Object Detection
* Multi-View Spatial-Temporal Learning for Understanding Unusual Behaviors in Untrimmed Naturalistic Driving Videos
* Multi-Window Fusion Spatial-Frequency Joint Self-Attention for Remote-Sensing Image Super-Resolution
* Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset
* Multiattention-Net: A Novel Approach to Face Anti-Spoofing with Modified Squeezed Residual Blocks
* Multichannel Long-Term Streaming Neural Speech Enhancement for Static and Moving Speakers
* MultiDiff: Consistent Novel View Synthesis from a Single Image
* Multidirectional Attention Fusion Network for SAR Change Detection
* MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning
* Multilevel Geometric Feature Embedding in Transformer Network for ALS Point Cloud Semantic Segmentation
* Multimodal Approach Integrating Convolutional and Recurrent Neural Networks for Alzheimer's Disease Temporal Progression Prediction, A
* Multimodal Attack Detection for Action Recognition Models
* Multimodal Industrial Anomaly Detection by Crossmodal Feature Mapping
* Multimodal Integration of an Enhanced Novel Pulmonary Auscultation Real-Time Diagnostic System
* Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
* Multimodal Perception and Decision-Making Systems for Complex Roads Based on Foundation Models
* Multimodal Progressive Modulation Network for Micro-Video Multi-Label Classification
* Multimodal Prompt Perceiver: Empower Adaptiveness, Generalizability and Fidelity for All-in-One Image Restoration
* Multimodal Representation Learning by Alternating Unimodal Adaptation
* Multimodal Sense-Informed Forecasting of 3D Human Motions
* Multimodal Understanding of Memes with Fair Explanations
* MultIOD: Rehearsal-free Multihead Incremental Object Detector
* MultiPanoWise: holistic deep architecture for multi-task dense prediction from a single panoramic image
* MultiPhys: Multi-Person Physics-Aware 3D Motion Estimation
* Multiplane Prior Guided Few-Shot Aerial Scene Rendering
* Multiple Targets ISAR Imaging Method with Removal of Micro-Motion Connection Based on Joint Constraints, A
* Multiple View Geometry Transformers for 3D Human Pose Estimation
* Multiple-Band Electric Field Response to the Geomagnetic Storm on 4 November 2021
* MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World
* MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild
* Multiscale Aligned Spatial-Temporal Interaction for Video-Based Person Re-Identification
* Multiscale Vision Transformers Meet Bipartite Matching for Efficient Single-Stage Action Localization
* Multispectral Blood Smear Background Images Reconstruction for Malaria Unstained Images Normalization, A
* Multispectral UAV Image Classification of Jimson Weed (Datura stramonium L.) in Common Bean (Phaseolus vulgaris L.)
* Multitemporal Hyperspectral Characterization of Wheat Infested by Wheat Stem Sawfly, Cephus cinctus Norton
* Multiview Aerial Visual Recognition (MAVREC): Can Multi-View Improve Aerial Visual Perception?
* Multiway Point Cloud Mosaicking with Diffusion and Global Optimization
* MuRF: Multi-Baseline Radiance Fields
* MuseChat: A Conversational Music Recommendation System for Videos
* MUSIC Based Multipath Delay Estimation Method in the Fractional Domain for OFDM-LFM
* MusicHiFi: Fast High-Fidelity Stereo Vocoding
* Must Unsupervised Continual Learning Relies on Previous Information?
* Mutual Gradient Inversion: Unveiling Privacy Risks of Federated Learning on Multi-Modal Signals
* MV-Adapter: Multimodal Video Transfer Learning for Video Text Retrieval
* MV-Soccer: Motion-Vector Augmented Instance Segmentation for Soccer Player Tracking
* MvAV-pix2pixHD: Multi-view Aerial View Image Translation
* MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
* MVCPS-NeuS: Multi-View Constrained Photometric Stereo for Neural Surface Reconstruction
* MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation
* MVDiff: Scalable and Flexible Multi-view Diffusion for 3D Object Reconstruction from Single-View
* MVHumanNet: A Large-Scale Dataset of Multi-View Daily Dressing Human Captures
* MVIP-NeRF: Multi-View 3D Inpainting on NeRF Scenes via Diffusion Prior
* MVP: One-Shot Object Pose Estimation by Matching With Visible Points
* Myth of the Pyramid, The
* N-Point Linear Solver for Line and Motion Estimation with Event Cameras, An
* Named Entity Driven Zero-Shot Image Manipulation
* NAPGuard: Towards Detecting Naturalistic Adversarial Patches
* Narrative Action Evaluation with Prompt-Guided Multimodal Interaction
* Narrowing the Synthetic-to-Real Gap for Thermal Infrared Semantic Image Segmentation Using Diffusion-based Conditional Image Synthesis
* NARUTO: Neural Active Reconstruction from Uncertain Target Observations
* Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners
* Navigate Beyond Shortcuts: Debiased Learning through the Lens of Neural Collapse
* Navigating Beyond Dropout: An Intriguing Solution Towards Generalizable Image Super Resolution
* Navigating Immovable Assets: A Graph-Based Spatio-Temporal Data Model for Effective Information Management
* NAYER: Noisy Layer Data Generation for Efficient and Effective Data-free Knowledge Distillation
* NB-GTR: Narrow-Band Guided Turbulence Removal
* NC-SDF: Enhancing Indoor Scene Reconstruction Using Neural SDFs with View-Dependent Normal Compensation
* NC-TTT: A Noise Constrastive Approach for Test-Time Training
* Nearest is Not Dearest: Towards Practical Defense Against Quantization-Conditioned Backdoor Attacks
* NEAT: Distilling 3D Wireframes from Neural Attraction Fields
* NECA: Neural Customizable Human Avatar
* Negative Label and Noise Information Guided Disambiguation for Partial Multi-Label Learning
* Neglected Tails in Vision-Language Models, The
* Neighbor Relations Matter in Video Scene Detection
* Neighborhood Multi-Compound Transformer for Point Cloud Registration
* Neighborhood-Aware Mutual Information Maximization for Source-Free Domain Adaptation
* NeISF: Neural Incident Stokes Field for Geometry and Material Estimation
* NeLF-Pro: Neural Light Field Probes for Multi-Scale Novel View Synthesis
* NeRF Analogies: Example-Based Visual Attribute Transfer for NeRFs
* NeRF as Pretraining at Scale: Generalizable 3D-Aware Semantic Representation Learning from View Prediction
* NeRF Director: Revisiting View Selection in Neural Volume Rendering
* NeRF On-the-go: Exploiting Uncertainty for Distractor-free NeRFs in the Wild
* NeRF-HuGS: Improved Neural Radiance Fields in Non-static Scenes Using Heuristics-Guided Segmentation
* NeRFCodec: Neural Feature Compression Meets Neural Radiance Fields for Memory-Efficient Scene Representation
* NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows
* NeRFiller: Completing Scenes via Generative 3D Inpainting
* NeRSP: Neural 3D Reconstruction for Reflective Objects with Sparse Polarized Images
* NetTrack: Tracking Highly Dynamic Objects with a Net
* NeuRAD: Neural Rendering for Autonomous Driving
* Neural 3D Strokes: Creating Stylized 3D Scenes with Vectorized 3D Strokes
* Neural Architecture Search as Program Transformation Exploration
* Neural Clustering Based Visual Representation Learning
* Neural Degradation Representation Learning for All-in-One Image Restoration
* Neural Directional Encoding for Efficient and Accurate View-Dependent Appearance Modeling
* Neural Exposure Fusion for High-Dynamic Range Object Detection
* Neural Fields as Distributions: Signal Processing Beyond Euclidean Space
* Neural Fields for Co-Reconstructing 3D Objects from Incidental 2D Data
* Neural Implicit Morphing of Face Images
* Neural Implicit Representation for Building Digital Twins of Unknown Articulated Objects
* Neural Lineage
* Neural Markov Random Field for Stereo Matching
* Neural Modes: Self-supervised Learning of Nonlinear Modal Subspaces
* Neural Network-Based Estimation of Near-Surface Air Temperature in All-Weather Conditions Using FY-4A AGRI Data over China
* Neural Network-Based Fusion of InSAR and Optical Digital Elevation Models with Consideration of Local Terrain Features
* Neural Parametric Gaussians for Monocular Non-Rigid Object Reconstruction
* Neural Point Cloud Diffusion for Disentangled 3D Shape and Appearance Generation
* Neural Redshift: Random Networks are not Random Functions
* Neural Refinement for Absolute Pose Regression with Feature Synthesis
* Neural Sign Actors: A diffusion model for 3D sign language production from text
* Neural Spline Fields for Burst Image Fusion and Layer Separation
* Neural Super-Resolution for Real-Time Rendering with Radiance Demodulation
* Neural Underwater Scene Representation
* Neural Video Compression with Feature Modulation
* Neural Visibility Field for Uncertainty-Driven Active Mapping
* Neuromorphic Lip-Reading with Signed Spiking Gated Recurrent Units
* New Agronomists: Language Models are Experts in Crop Management, The
* New Benchmark and Low Computational Cost Localization Method for Cephalometric Analysis, A
* New Framework for Generating Indoor 3D Digital Models from Point Clouds, A
* New Method for Top-Down Inversion Estimation of Carbon Dioxide Flux Based on Deep Learning, A
* New NDSA (Normalized Differential Spectral Attenuation) Measurement Campaign for Estimating Water Vapor along a Radio Link, A
* New Robust Lunar Landing Selection Method Using the Bayesian Optimization of Extreme Gradient Boosting Model (BO-XGBoost), A
* New Subject-Sensitive Hashing Algorithm Based on Multi-PatchDrop and Swin-Unet for the Integrity Authentication of HRRS Image, A
* Newton Time-Reassigned Multi-Synchrosqueezing Wavelet Transform
* NICE: CVPR 2023 Challenge on Zero-shot Image Captioning
* NICE: Neurogenesis Inspired Contextual Encoding for Replay-free Class Incremental Learning
* NIFTY: Neural Object Interaction Fields for Guided Human Motion Synthesis
* Night-Time Vehicle Detection Based on Hierarchical Contextual Information
* NightCC: Nighttime Color Constancy via Adaptive Channel Masking
* Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report, The
* NIVeL: Neural Implicit Vector Layers for Text-to-Vector Generation
* nnMobileNet: Rethinking CNN for Retinopathy Research
* No Bells, Just Whistles: Sports Field Registration by Leveraging Geometric Properties
* No More Ambiguity in 360° Room Layout via Bi-Layout Estimation
* No Time to Train: Empowering Non-Parametric Networks for Few-Shot 3D Scene Segmentation
* NOISe: Nuclei-Aware Osteoclast Instance Segmentation for Mouse-to-Human Domain Transfer
* NoiseCLR: A Contrastive Learning Approach for Unsupervised Discovery of Interpretable Directions in Diffusion Models
* NoiseCollage: A Layout-Aware Text-to-Image Diffusion Model Based on Noise Cropping and Merging
* Noisy Elephant in the Room: Is Your out-of-Distribution Detector Robust to Label Noise?, A
* Noisy One-Point Homographies are Surprisingly Good
* Noisy-Correspondence Learning for Text-to-Image Person Re-Identification
* Non-autoregressive Sequence-to-Sequence Vision-Language Models
* Non-Cascaded and Crosstalk-Free Multi-Image Encryption Based on Optical Scanning Holography Using 2D Orthogonal Compressive Sensing
* Non-Exemplar Class-Incremental Learning by Random Auxiliary Classes Augmentation and Mixed Features
* Non-rigid Structure-from-Motion: Temporally-smooth Procrustean Alignment and Spatially-variant Deformation Modeling
* Non-Serial Quantization-Aware Deep Optics for Snapshot Hyperspectral Imaging
* Nonlinear Influence of the Built Environment on the Attraction of the Third Activity: A Comparative Analysis of Inflow from Home and Work
* Nonnegative Tensor Representation With Cross-View Consensus for Incomplete Multi-View Clustering
* NOPE: Novel Object Pose Estimation from a Single Image
* Normalizing Flows on the Product Space of SO(3) Manifolds for Probabilistic Human Pose Modeling
* Not All Classes Stand on Same Embeddings: Calibrating a Semantic Distance with Metric Tensor
* Not All Prompts Are Secure: A Switchable Backdoor Attack Against Pre-trained Vision Transfomers
* Not All Voxels are Equal: Hardness-Aware Semantic Scene Completion with Self-Distillation
* Novel Algorithm for Multi-Target Angle and Range-Velocity Estimation With MIMO-OFDM Communication Waveforms, A
* Novel Class Discovery for Ultra-Fine-Grained Visual Categorization
* Novel Factor Graph Framework for Tightly Coupled GNSS/INS Integration With Carrier-Phase Ambiguity Resolution, A
* Novel Framework for Spatiotemporal Susceptibility Prediction of Rainfall-Induced Landslides: A Case Study in Western Pennsylvania, A
* Novel High-Speed Data Encryption Scheme for Internet of Medical Things Using Modified Elliptic Curve Diffie-Hellman and Advance Encryption Standard, A
* Novel Hybrid Deep-Learning Approach for Flood-Susceptibility Mapping, A
* Novel Method for Simplifying the Distribution Envelope of Green Tide for Fast Drift Prediction in the Yellow Sea, China, A
* Novel Strategy Coupling Optimised Sampling with Heterogeneous Ensemble Machine-Learning to Predict Landslide Susceptibility, A
* Novel Truncated Norm Regularization Method for Multi-Channel Color Image Denoising, A
* Novel View Synthesis with View-Dependent Effects from a Single Image
* NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors
* NTIRE 2024 Challenge on Blind Enhancement of Compressed Image: Methods and Results
* NTIRE 2024 Challenge on Bracketing Image Restoration and Enhancement: Datasets, Methods and Results
* NTIRE 2024 Challenge on HR Depth from Images of Specular and Transparent Surfaces
* NTIRE 2024 Challenge on Image Super-Resolution (×4): Methods and Results
* NTIRE 2024 Challenge on Light Field Image Super-Resolution: Methods and Results
* NTIRE 2024 Challenge on Low Light Image Enhancement: Methods and Results
* NTIRE 2024 Challenge on Night Photography Rendering
* NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results
* NTIRE 2024 Challenge on Stereo Image Super-Resolution: Methods and Results
* NTIRE 2024 Dense and Non-Homogeneous Dehazing Challenge Report
* NTIRE 2024 Image Shadow Removal Challenge Report
* NTIRE 2024 Quality Assessment of AI-Generated Content Challenge
* NTIRE 2024 Restore Any Image Model (RAIM) in the Wild Challenge
* NTO3D: Neural Target Object 3D Reconstruction with Segment Anything
* NurtureNet: A Multi-task Video-based Approach for Newborn Anthropometry
* Nutrient Deficiency Classification in Rice Plants Using DenseNet121
* NViST: In the Wild New View Synthesis from a Single Image with Transformers
* OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation
* OakInk2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion
* Object Dynamics Modeling with Hierarchical Point Cloud-Based Representations
* Object Pose Estimation via the Aggregation of Diffusion Features
* Object Recognition as Next Token Prediction
* Objects as Volumes: A Stochastic Geometry View of Opaque Solids
* Observation of Post-Sunset Equatorial Plasma Bubbles with BDS Geostationary Satellites over South China
* Observation-Guided Diffusion Probabilistic Models
* OCAI: Improving Optical Flow Estimation by Occlusion and Consistency Aware Interpolation
* OccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation Networks
* OCMCTrack: Online Multi-Target Multi-Camera Tracking with Corrective Matching Cascade
* ODCR: Orthogonal Decoupling Contrastive Regularization for Unpaired Image Dehazing
* ODIN: A Single Model for 2D and 3D Segmentation
* ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting
* OED: Towards One-stage End-to-End Dynamic Scene Graph Generation
* OGRMPI: An Efficient Multiview Integrated Multiplane Image based on Occlusion Guided Residuals
* OHTA: One-shot Hand Avatar via Data-driven Implicit Priors
* OMG-Seg: Is One Model Good Enough for all Segmentation?
* OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers
* Omni-Crack30k: A Benchmark for Crack Segmentation and the Reasonable Effectiveness of Transfer Learning
* Omni-Q: Omni-Directional Scene Understanding for Unsupervised Visual Grounding
* Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-Rank Experts
* OmniControlNet: Dual-stage Integration for Conditional Image Generation
* OmniGlue: Generalizable Feature Matching with Foundation Model Guidance
* OmniLocalRF: Omnidirectional Local Radiance Fields from Dynamic Videos
* OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM
* OmniMotionGPT: Animal Motion Generation with Limited Data
* OMNIPARSER: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition
* OmniSDF: Scene Reconstruction Using Omnidirectional Signed Distance Functions and Adaptive Binoctrees
* OmniSeg3D: Omniversal 3D Segmentation via Hierarchical Contrastive Learning
* OmniVec2: A Novel Transformer Based Network for Large Scale Multimodal and Multitask Learning
* OmniViD: A Generative Framework for Universal Video Understanding
* On a Correlation Model for Laser Scanners: A Large Eddy Simulation Experiment
* On Accuracy and Speed of Geodesic Regression: Do Geometric Priors Improve Learning on Small Datasets?
* On Exact Inversion of DPM-Solvers
* On Perceived AV Synchronization in 360° Multimedia
* On Scaling Up a Multilingual Vision and Language Model
* On the Benefit of Optimal Transport for Curriculum Reinforcement Learning
* On the Capabilities of the IREA-CNR Airborne SAR Infrastructure
* On the Consistency of Stochastic Noise Properties and Velocity Estimations from Different Analysis Strategies and Centers with Environmental Loading and CME Corrections
* On the Content Bias in Fréchet Video Distance
* On the Design of Robust Differential Beamformers From the Beampattern Error Perspective
* On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm
* On the Efficiency of Privacy Attacks in Federated Learning
* On the Estimation of Image-Matching Uncertainty in Visual Place Recognition
* On the Faithfulness of Vision Transformer Explanations
* On the Optimality of Inverse Gaussian Approximation for Lognormal Channel Models
* On the Relationships between Clear-Sky Indices in Photosynthetically Active Radiation and Broadband Ranges in Overcast and Broken-Cloud Conditions
* On the Road to Portability: Compressing End-to-End Motion Planner for Autonomous Driving
* On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation
* On the Robustness of Large Multimodal Models Against Image Adversarial Attacks
* On the Scalability of Diffusion-based Text-to-Image Generation
* On the Strong Convexity of PnP Regularization Using Linear Denoisers
* On the Test-Time Zero-Shot Generalization of Vision-Language Models: Do we Really need Prompt Learning?
* On the Theoretical Link between Optimized Geospatial Conflation Models for Linear Features
* On Train-Test Class Overlap and Detection for Image Retrieval
* On-Orbit Wavelength Calibration Error Analysis of the Spaceborne Hyperspectral Greenhouse Gas Monitoring Instrument Using the Solar Fraunhofer Lines
* Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression
* One class classification-based quality assurance of organs-at-risk delineation in radiotherapy
* One Embedding to Predict Them All: Visible and Thermal Universal Face Representations for Soft Biometric Estimation via Vision Transformers
* One Fits Many: Class Confusion Loss for Versatile Domain Adaptation
* One More Step: A Versatile Plug-and-Play Module for Rectifying Diffusion Schedule Flaws and Enhancing Low-Frequency Controls
* One Prompt Word is Enough to Boost Adversarial Robustness for Pre-Trained Vision-Language Models
* One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion
* One-Class Face Anti-Spoofing via Spoof Cue Map-Guided Feature Learning
* One-Click Upgrade from 2D to 3D: Sandwiched RGB-D Video Compression for Stereoscopic Teleconferencing
* One-dimensional Adapter to Rule Them All: Concepts, Diffusion Models and Erasing Applications
* One-pass View-unaligned Clustering
* One-Prompt to Segment All Medical Images
* One-Shot Open Affordance Learning with Foundation Models
* One-Shot Structure-Aware Stylized Image Synthesis
* One-Step Diffusion with Distribution Matching Distillation
* OneFormer3D: One Transformer for Unified Point Cloud Segmentation
* OneLLM: One Framework to Align All Modalities with Language
* OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning
* Online Approach and Evaluation Method for Tracking People Across Cameras in Extremely Long Video Sequence, An
* Online Multi-camera People Tracking with Spatial-temporal Mechanism and Anchor-feature Hierarchical Clustering
* Online Spatio-Temporal Correlation-Based Federated Learning for Traffic Flow Forecasting
* Online Task-Free Continual Generative and Discriminative Learning via Dynamic Cluster Memory
* OoD-Control: Generalizing Control in Unseen Environments
* OOSTraj: Out-of-Sight Trajectory Prediction With Vision-Positioning Denoising
* Open Vocabulary Semantic Scene Sketch Understanding
* Open-Set Domain Adaptation for Semantic Segmentation
* Open-Vocabulary 3D Semantic Segmentation with Foundation Models
* Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models
* Open-vocabulary object 6D pose estimation
* Open-Vocabulary Segmentation with Semantic-Assisted Calibration
* Open-Vocabulary Semantic Segmentation with Image Embedding Balancing
* Open-Vocabulary Text-Driven Human Image Generation
* Open-Vocabulary Video Anomaly Detection
* Open-World Human-Object Interaction Detection via Multi-Modal Prompts
* Open-world Instance Segmentation: Top-down Learning with Bottom-up Supervision
* Open-World Semantic Segmentation Including Class Similarity
* Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance
* Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set Relationships
* OPEN: Occlusion-Invariant Perception Network for Single Image-Based 3D Shape Retrieval
* OpenBias: Open-Set Bias Detection in Text-to-Image Generative Models
* OpenEQA: Embodied Question Answering in the Era of Foundation Models
* OpenESS: Event-Based Semantic Scene Understanding with Open Vocabularies
* OpenStory: A Large-Scale Open-Domain Dataset for Subject-Driven Visual Storytelling
* OpenStreetView-5M: The Many Roads to Global Visual Geolocation
* OpenTrench3D: A Photogrammetric 3D Point Cloud Dataset for Semantic Segmentation of Underground Utilities
* OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
* Operational and Climate Land Surface Temperature Products from the Sea and Land Surface Temperature Radiometers on Sentinel-3A and 3B, The
* Opinion-Unaware Blind Image Quality Assessment Using Multi-Scale Deep Feature Statistics
* Opposing Impacts of Greenspace Fragmentation on Land Surface Temperature in Urban and Surrounding Rural Areas: A Case Study in Changsha, China
* Optical Characterization of Coastal Waters with Atmospheric Correction Errors: Insights from SGLI and AERONET-OC
* OpticalDR: A Deep Optical Imaging Model for Privacy-Protective Depression Recognition
* Optimal Deployment of Connected and Autonomous Vehicle Dedicated Lanes: A Trade-Off Between Safety and Efficiency
* Optimal Drive-By Sensing in Urban Road Networks With Large-Scale Ridesourcing Vehicles
* Optimal Transport Aggregation for Visual Place Recognition
* Optimization and Application Analysis of Phase Correction Method Based on Improved Image Registration in Ultrasonic Image Detection
* Optimized Martian Dust Displacement Detection Using Explainable Machine Learning
* Optimized Pattern Partitioning for Multi-Pass Printing: PARAOMASKING
* Optimizing Bus Operations at Autonomous Intersection With Trajectory Planning and Priority Control
* Optimizing Diffusion Noise Can Serve As Universal Motion Priors
* Optimizing Object Detection via Metric-driven Training Data Selection
* Optimizing Subband Adaptive Filters for Resilience Against Unanticipated Signal Truncation
* Optimizing Train-to-Train Rescue and Rescheduling in Metro Systems
* Orbit Determination Method for BDS-3 MEO Satellites Based on Multi-Source Observation Links
* Orchestrate Latent Expertise: Advancing Online Continual Learning with Multi-Level Supervision and Reverse Self-Distillation
* OrCo: Towards Better Generalization via Orthogonality and Contrast for Few-Shot Class-Incremental Learning
* Order Dispatching Via GNN-Based Optimization Algorithm for On-Demand Food Delivery
* Orientation-conditioned Facial Texture Mapping for Video-based Facial Remote Photoplethysmography Estimation
* Oriented Object Detection Based on Adaptive Feature Learning and Enrichment
* OrthCaps: An Orthogonal CapsNet with Sparse Attention Routing and Pruning
* Orthogonal Adaptation for Modular Customization of Diffusion Models
* Osprey: Pixel Understanding with Visual Instruction Tuning
* OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition
* OTCLDA: Optimal Transport and Contrastive Learning for Domain Adaptive Semantic Segmentation
* OTE: Exploring Accurate Scene Text Recognition Using One Token
* Our Deep CNN Face Matchers Have Developed Achromatopsia
* OUR-Net: A Multi-Frequency Network With Octave Max Unpooling and Octave Convolution Residual Block for Pavement Crack Segmentation
* Outdoor Scene Extrapolation with Hierarchical Generative Cellular Automata
* Outsmarting Biometric Imposters: Enhancing Iris-Recognition System Security through Physical Adversarial Example Generation and PAD Fine-Tuning
* OVER-NAV: Elevating Iterative Vision-and-Language Navigation with Open-Vocabulary Detection and StructurEd Representation
* Overcoming Generic Knowledge Loss with Selective Parameter Update
* Overlap Suppression Clustering for Offline Multi-Camera People Tracking
* Overload: Latency Attacks on Object Detection for Edge Devices
* Overtaking Feasibility Prediction for Mixed Connected and Connectionless Vehicles
* Overview of Text-Based Person Search: Recent Advances and Future Directions, An
* OVFoodSeg: Elevating Open-Vocabulary Food Image Segmentation via Image-Informed Textual Representation
* OVMR: Open-Vocabulary Recognition with Multi-Modal References
* PACER+: On-Demand Pedestrian Animation Controller in Driving Scenarios
* PAD: Patch-Agnostic Defense against Adversarial Patch Attacks
* Paediatric Pulse Rate Measurements: a Comparison of Methods using Remote Photoplethysmography
* Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering
* Paint3D: Paint Anything 3D With Lighting-Less Texture Diffusion Models
* Painter Verification Using Color Palettes: An Exploratory Study
* PAIR Diffusion: A Comprehensive Multimodal Object-Level Image Editor
* Pair-ID: A Dual Modal Framework for Identity Preserving Image Generation
* PairAug: What Can Augmented Image-Text Pairs Do for Radiology?
* PairDETR: Joint Detection and Association of Human Bodies and Faces
* Panacea: Panoramic and Controllable Video Generation for Autonomous Driving
* Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
* PanoContext-Former: Panoramic Total Scene Understanding with a Transformer
* PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation
* PanoPose: Self-supervised Relative Pose Estimation for Panoramic Images
* PanoRecon: Real-Time Panoptic 3D Reconstruction from Monocular Video
* PAPR in Motion: Seamless Point-level 3D Scene Interpolation
* PARA-Drive: Parallelized Architecture for Real-Time Autonomous Driving
* Parallel Genetic Algorithm With Variable Neighborhood Search for the Vehicle Routing Problem in Forest Fire-Fighting, A
* Parameter Efficient Fine-tuning of Self-supervised ViTs without Catastrophic Forgetting
* Parameter Efficient Fine-tuning of Self-supervised ViTs without Catastrophic Forgetting
* Parameter Efficient Fine-Tuning via Cross Block Orchestration for Segment Anything Model
* Parameter Efficient Self-Supervised Geospatial Domain Adaptation
* ParameterNet: Parameters are All You Need for Large-Scale Visual Pretraining of Mobile Networks
* Parametric Binaural Beamforming Based on Auditory Perception
* ParamISP: Learned Forward and Inverse ISPs Using Camera Parameters
* PARASOL: Parametric Style Control for Diffusion Image Synthesis
* PaReNeRF: Toward Fast Large-Scale Dynamic NeRF with Patch-Based Reference
* Part-Aware Correlation Networks for Few-Shot Learning
* Part-Aware Unified Representation of Language and Skeleton for Zero-Shot Action Recognition
* PartDistill: 3D Shape Part Segmentation by Vision-Language Model Distillation
* Partial-to-Partial Shape Matching with Geometric Consistency
* PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness
* Passenger Emergency Evacuation in Subway Station Systems: A Bibliometric Analysis and Systematic Review
* Passive Snapshot Coded Aperture Dual-Pixel RGB-D Imaging
* Patch2Self2: Self-Supervised Denoising on Coresets via Matrix Sketching
* PatchFusion: An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation
* Pavement Crack Translator for Data Augmentation and Pixel-Level Detection Based on Weakly Supervised Learning, A
* PBAG: A Privacy-Preserving Blockchain-Based Authentication Protocol With Global-Updated Commitment in IoVs
* PBWR: Parametric-Building-Wireframe Reconstruction from Aerial LiDAR Point Clouds
* PCQA: A Strong Baseline for AIGC Quality Assessment Based on Prompt Condition
* PDF: A Probability-Driven Framework for Open World 3D Point Cloud Semantic Segmentation
* PedAST-GCN: Fast Pedestrian Crossing Intention Prediction Using Spatial-Temporal Attention Graph Convolution Networks
* Pedestrian Intrusion Detection in Railway Station Based on Mirror Translation Attention and Feature Pooling Enhancement
* Pedestrian is Worth One Prompt: Towards Language Guidance Person Re-Identification, A
* Pedestrian Simulation Challenges: Modeling Techniques and Emerging Positioning Technologies for ITS Applications
* Peekaboo: Interactive Video Generation via Masked-Diffusion
* PeerAiD: Improving Adversarial Distillation from a Specialized Peer Tutor
* PEGASUS: Personalized Generative 3D Avatars with Composable Attributes
* PELA: Learning Parameter-Efficient Models with Low-Rank Approximation
* PeLK: Parameter-Efficient Large Kernel ConvNets with Peripheral Convolution
* PEM: Prototype-Based Efficient MaskFormer for Image Segmentation
* Penalized Inverse Probability Measure for Conformal Classification, The
* Perada: Parameter-Efficient Federated Learning Personalization with Generalization Guarantees
* Perception-Based Prediction for Efficient Kinesthetic Coding
* Perception-Guided Quality Metric of 3D Point Clouds Using Hybrid Strategy
* Perception-Oriented Video Frame Interpolation via Asymmetric Blending
* PerceptionGPT: Effectively Fusing Visual Perception Into LLM
* Perceptual Assessment and Optimization of HDR Image Rendering
* Perceptual Image Hashing Using Feature Fusion of Orthogonal Moments
* Perceptual Quality Assessment of Virtual Reality Videos in the Wild
* Performance Analysis of Hammerstein Block-Oriented Functional Link Adaptive Filters
* Performance Analysis of RIS-Assisted Coded Cooperation System Based on Polar Codes With Finite Code Length
* Performance Evaluation of Segment Anything Model with Variational Prompting for Application to Non-Visible Spectrum Imagery
* Performance of the Earth Explorer 11 SeaSTAR Mission Candidate for Simultaneous Retrieval of Total Surface Current and Wind Vectors
* Permutation Equivariance of Transformers and its Applications
* Person in Place: Generating Associative Skeleton-Guidance Maps for Human-Object Interaction Image Editing
* Person-in-WiFi 3D: End-to-End Multi-Person 3D Pose Estimation with Wi-Fi
* Personalized Residuals for Concept-Driven Text-to-Image Generation
* PersonMAE: Person Re-Identification Pre-Training With Masked AutoEncoders
* Perspective on Deep Vision Performance with Standard Image and Video Codecs, A
* Perspectives and Challenges in Bolide Infrasound Processing and Interpretation: A Focused Review with Case Studies
* Perturbing Attention Gives You More Bang for the Buck: Subtle Imaging Perturbations That Efficiently Fool Customized Diffusion Models
* PeVL: Pose-Enhanced Vision-Language Model for Fine-Grained Human Action Recognition
* PFStorer: Personalized Face Restoration and Super-Resolution
* PGNN-Net: Parallel Graph Neural Networks for Hyperspectral Image Classification Using Multiple Spatial-Spectral Features
* PH-Net: Semi-Supervised Breast Lesion Segmentation via Patch-Wise Hardness
* Phase Error Correction in Sparse Linear MIMO Radar Based on the Equivalent Phase Center Principle
* Phenological and Biophysical Mediterranean Orchard Assessment Using Ground-Based Methods and Sentinel 2 Data
* Photo-Realistic Image Restoration in the Wild with Controlled Vision-Language Models
* Photo-SLAM: Real-Time Simultaneous Localization and Photorealistic Mapping for Monocular, Stereo, and RGB-D Cameras
* PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
* Photorealistic Arm Robot Simulation for 3D Plant Reconstruction and Automatic Annotation using Unreal Engine 5
* PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI
* PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics
* Physical 3D Adversarial Attacks against Monocular Depth Estimation in Autonomous Driving
* Physical Backdoor: Towards Temperature-Based Backdoor Attacks in the Physical World
* Physical Property Understanding from Language-Embedded Feature Fields
* Physics Based Camera Privacy: Lens and Network Co-Design to the Rescue
* Physics-Aware Hand-Object Interaction Denoising
* Physics-Driven Spectrum-Consistent Federated Learning for Palmprint Verification
* Physics-guided Shape-from-Template: Monocular Video Perception through Neural Surrogate Models
* Physics-Informed and Attention-Based Graph Learning Approach for Regional Electric Vehicle Charging Demand Prediction, A
* Physics-Informed Low-Rank Deep Neural Network for Blind and Universal Lens Aberration Correction, A
* PhysPT: Physics-aware Pretrained Transformer for Estimating Human Dynamics from Monocular Videos
* PI3D: Efficient Text-to-3D Generation with Pseudo-Image Diffusion
* PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models
* Pick-or-Mix: Dynamic Channel Sampling for ConvNets
* Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions, A
* PICTURE: PhotorealistIC Virtual Try-on from UnconstRained dEsigns
* PIE-NeRF: Physics-Based Interactive Elastodynamics with NeRF
* PIGEON: Predicting Image Geolocations
* PikeLPN: Mitigating Overlooked Inefficiencies of Low-Precision Neural Networks
* PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs
* Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs
* PitcherNet: Powering the Moneyball Evolution in Baseball Video Analytics
* pix2gestalt: Amodal Segmentation by Synthesizing Wholes
* Pixel Aligned Language Models
* Pixel-Level Semantic Correspondence Through Layout-Aware Representation Learning and Multi-Scale Matching Integration
* PixelLM: Pixel Reasoning with Large Multimodal Model
* PixelRNN: In-pixel Recurrent Neural Networks for End-to-end-optimized Perception with Neural Sensors
* PixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction
* PKU-DyMVHumans: A Multi-View Video Benchmark for High-Fidelity Dynamic Human Modeling
* PLACE: Adaptive Layout-Semantic Fusion for Semantic Image Synthesis
* Plant Species Classification and Biodiversity Estimation from UAV Images with Deep Learning
* PlatoNeRF: 3D Reconstruction in Plato's Cave via Single-View Two-Bounce Lidar
* PLGSLAM: Progressive Neural Scene Represenation with Local to Global Bundle Adjustment
* Plug and Play Active Learning for Object Detection
* Plug-and-Play Diffusion Distillation
* PMAFusion: Projection-Based Multi-Modal Alignment for 3D Semantic Occupancy Prediction
* PNeRV: Enhancing Spatial Consistency via Pyramidal Neural Representation for Videos
* POCE: Primal Policy Optimization with Conservative Estimation for Multi-constraint Offline Reinforcement Learning
* Point Cloud Pre-Training with Diffusion Models
* Point Cloud Semantic Segmentation by Adaptively Fusing Information With Varying Distances
* Point Clouds Matching Based on Discrete Optimal Transport
* Point Transformer V3: Simpler, Faster, Stronger
* Point, Segment and Count: A Generalized Framework for Object Counting
* Point-Supervised Semantic Segmentation of Natural Scenes via Hyperspectral Imaging
* Point-VOS: Pointing Up Video Object Segmentation
* Point2CAD: Reverse Engineering CAD Models from 3D Point Clouds
* Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-End Oriented Object Detection with Single Point Supervision
* PointBeV: A Sparse Approach to BeV Predictions
* PointInfinity: Resolution-Invariant Point Diffusion Models
* PointOBB: Learning Oriented Object Detection via Single Point Supervision
* PointOfView: A Multi-modal Network for Few-shot 3D Point Cloud Classification Fusing Point and Multi-view Image Features
* PointPrompt: A Multi-modal Prompting Dataset for Segment Anything Model
* Points2NeRF: Generating Neural Radiance Fields from 3D point cloud
* Polar Matte: Fully Computational Ground-Truth-Quality Alpha Matte Extraction for Images and Video using Polarized Screen Matting
* Polarization Wavefront Lidar: Learning Large Scene Reconstruction from Polarized Wavefronts
* PolarRec: Improving Radio Interferometric Data Reconstruction Using Polar Coordinates
* Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
* Poly Kernel Inception Network for Remote Sensing Detection
* PoNQ: A Neural QEM-Based Mesh Representation
* POPDG: Popular 3D Dance Generation with PopDanceSet
* POPE: 6-DoF Promptable Pose Estimation of Any Object, in Any Scene, with One Reference
* Portrait4D: Learning One-Shot 4D Head Avatar Synthesis using Synthetic Data
* PortraitBooth: A Versatile Portrait Model for Fast Identity-Preserved Personalization
* Pose Adapted Shape Learning for Large-Pose Face Reenactment
* Pose-Guided Self-Training with Two-Stage Clustering for Unsupervised Landmark Discovery
* Pose-Transformed Equivariant Network for 3D Point Trajectory Prediction
* PoseIRM: Enhance 3D Human Pose Estimation on Unseen Camera Settings via Invariant Risk Minimization
* Positive-Unlabeled Learning by Latent Group-Aware Meta Disambiguation
* Post-Occupancy Evaluation of the Improved Old Residential Neighborhood Satisfaction Using Principal Component Analysis: The Case of Wuxi, China
* Posterior Distillation Sampling
* PostureHMR: Posture Transformation for 3D Human Mesh Recovery
* Potential and Electro-Mechanical Coupling Analysis of a Novel HTS Maglev System Employing Double-Sided Homopolar Linear Synchronous Motor
* Potential Risk Localization via Weak Labeling out of Blind Spot
* Potentials in Using VR for Facilitating Geography Teaching in Classrooms: A Systematic Review
* POV-Based Highway Vehicle Trajectory Dataset and Prediction Architecture, A
* PP-SAM: Perturbed Prompts for Robust Adaption of Segment Anything Model for Polyp Segmentation
* PQ-VAE: Learning Hierarchical Discrete Representations with Progressive Quantization
* Practical Measurements of Translucent Materials with Inter-Pixel Translucency Prior
* Practical Region-level Attack against Segment Anything Models
* Practical Reset Logarithmic Sliding Mode Control for Physical Human-Robot Interaction With Sensorless Behavior Estimation
* PracticalDG: Perturbation Distillation on Vision-Language Models for Hybrid Domain Generalization
* PRCL: Probabilistic Representation Contrastive Learning for Semi-Supervised Semantic Segmentation
* PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models
* Pre-trained Bidirectional Dynamic Memory Network For Long Video Question Answering
* Pre-Trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness
* Pre-trained Vision and Language Transformers are Few-Shot Incremental Learners
* Pre-Training Vision Models with Mandelbulb Variations
* Predicated Diffusion: Predicate Logic-Based Attention Guidance for Text-to-Image Diffusion Models
* Predicting Carbohydrate Concentrations in Avocado and Macadamia Leaves Using Hyperspectral Imaging with Partial Least Squares Regressions and Artificial Neural Networks
* Predicting Gross Primary Productivity under Future Climate Change for the Tibetan Plateau Based on Convolutional Neural Networks
* PredToken: Predicting Unknown Tokens and Beyond with Coarse-to-Fine Iterative Decoding
* PREGO: Online Mistake Detection in PRocedural EGOcentric Videos
* Presenting a Long-Term, Reprocessed Dataset of Global Sea Surface Temperature Produced Using the OSTIA System
* Preserving Fairness Generalization in Deepfake Detection
* Preserving Label-Related Domain-Specific Information for Cross-Domain Semantic Segmentation
* Previously on... from Recaps to Story Summarization
* Prior Images Guided Generative Autoencoder Model for Dual-Camera Compressive Spectral Imaging
* Privacy-Preserving Collaboration for Multi-Organ Segmentation via Federated Learning from Sites with Partial Labels
* Privacy-Preserving Cruise Control for Heterogeneous Platoon Vehicle System Under Actuator Faults and Uncertainties
* Privacy-Preserving Distributed Optimal Control for Vehicular Platoon With Quantization
* Privacy-Preserving Face Recognition Using Trainable Feature Subtraction
* Privacy-Preserving Optics for Enhancing Protection in Face De-Identification
* Privacy-Preserving Travel Recommendation Based on Stay Points Over Outsourced Spatio-Temporal Data
* Probabilistic Game-Theoretic Traffic Routing
* Probabilistic Sampling of Balanced K-Means using Adiabatic Quantum Computing
* Probabilistic Speech-Driven 3D Facial Motion Synthesis: New Benchmarks, Methods, and Applications
* Probabilistic Statistical Risk Assessment Method for Soil Erosion Using Remote Sensing Data: A Case Study of the Dali River Basin, A
* Probing Conceptual Understanding of Large Visual-Language Models
* Probing Synergistic High-Order Interaction in Infrared and Visible Image Fusion
* Probing the 3D Awareness of Visual Foundation Models
* Problem-Specific Knowledge Based Multi-Objective Meta-Heuristics Combined Q-Learning for Scheduling Urban Traffic Lights With Carbon Emissions
* Producing and Leveraging Online Map Uncertainty in Trajectory Prediction
* Programmable Motion Generation for Open-Set Motion Control Tasks
* Progress-Aware Online Action Segmentation for Egocentric Procedural Task Videos
* Progressive Diversity Generation for Single Domain Generalization
* Progressive Divide-and-Conquer via Subsampling Decomposition for Accelerated MRI
* Progressive Learning With Cross-Window Consistency for Semi-Supervised Semantic Segmentation
* Progressive Mask Transformer With Edge Enhancement for Image Manipulation Localization
* Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning
* Progressive Unsupervised Domain Adaptation for Radio Frequency Signal Attribute Recognition across Communication Scenarios
* Projecting Response of Ecological Vulnerability to Future Climate Change and Human Policies in the Yellow River Basin, China
* Projecting Trackable Thermal Patterns for Dynamic Computer Vision
* ProMark: Proactive Diffusion Watermarking for Causal Attribution
* ProMotion: Prototypes as Motion Learners
* Prompt Augmentation for Self-supervised Text-guided Image Manipulation
* Prompt Highlighter: Interactive Control for Multi-Modal LLMs
* Prompt Learning via Meta-Regularization
* Prompt Learning with One-Shot Setting based Feature Space Analysis in Vision-and-Language Models
* Prompt-Driven Dynamic Object-Centric Learning for Single Domain Generalization
* Prompt-Driven Referring Image Segmentation with Instance Contrasting
* Prompt-Enhanced Multiple Instance Learning for Weakly Supervised Video Anomaly Detection
* Prompt-Free Diffusion: Taking Text Out of Text-to-Image Diffusion Models
* Prompt3D: Random Prompt Assisted Weakly-Supervised 3D Object Detection
* Promptable Behaviors: Personalizing Multi-Objective Rewards from Human Preferences
* PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection
* PromptCIR: Blind Compressed Image Restoration with Prompt Learning
* PromptCoT: Align Prompt Distribution via Adapted Chain-of-Thought
* Prompting Foundational Models for Omni-supervised Instance Segmentation
* Prompting Hard or Hardly Prompting: Prompt Inversion for Text-to-Image Diffusion Models
* Prompting Vision Foundation Models for Pathology Image Analysis
* PromptKD: Unsupervised Prompt Distillation for Vision-Language Models
* PromptSync: Bridging Domain Gaps in Vision-Language Models through Class-Aware Prototype Alignment and Discrimination
* ProS: Prompting-to-Simulate Generalized Knowledge for Universal Cross-Domain Retrieval
* ProTeCt: Prompt Tuning for Taxonomic Open Set Classification
* Prototype-based Interpretable Model for Glaucoma Detection
* Prototypical Metric Segment Anything Model for Data-Free Few-Shot Semantic Segmentation
* Proximal Sensing for Characterising Seaweed Aquaculture Crop Conditions: Optical Detection of Ice-Ice Disease
* ProxyCap: Real-Time Monocular Full-Body Capture in World Space via Human-Centric Proxy-to-Motion Learning
* PrPSeg: Universal Proposition Learning for Panoramic Renal Pathology Segmentation
* Prune Efficiently by Soft Pruning
* Pruning as a Binarization Technique
* PSDPM: Prototype-based Secondary Discriminative Pixels Mining for Weakly Supervised Semantic Segmentation
* Pseudo Label Refinery for Unsupervised Domain Adaptation on Cross-Dataset 3D Object Detection
* Pseudo-label based unsupervised fine-tuning of a monocular 3D pose estimation model for sports motions
* Pseudo-Labeling Based Practical Semi-Supervised Meta-Training for Few-Shot Learning
* PSVMLP: Point and Shifted Voxel MLP for 3D deep learning
* Psychometry: An Omnifit Model for Image Reconstruction from Human Brain Activity
* PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse PreTrained Models from the Wild
* PTQ4SAM: Post-Training Quantization for Segment Anything
* PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection
* PU-Ray: Domain-Independent Point Cloud Upsampling via Ray Marching on Neural Implicit Surface
* Public Bike Scheduling Strategy Based on Demand Prediction for Unbalanced Life-Value Distribution
* Public-Private Attributes-Based Variational Adversarial Network for Audio-Visual Cross-Modal Matching
* PUDD: Towards Robust Multi-modal Prototype-based Deepfake Detection
* Puff-Net: Efficient Style Transfer with Pure Content and Style Feature Fusion Network
* Purified and Unified Steganographic Network
* Purposeful Regularization with Reinforcement Learning for Facial Expression Recognition In-the-Wild
* Putting the Object Back into Video Object Segmentation
* PV-Cap: 3D Dynamic Scene Understanding Through Open Physics-based Vocabulary
* Pyramid Fusion Transformer for Semantic Segmentation
* Q-Instruct: Improving Low-Level Visual Abilities for Multi-Modality Foundation Models
* QAttn: Efficient GPU Kernels for mixed-precision Vision Transformers
* QDFormer: Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition
* QN-Mixer: A Quasi-Newton MLP-Mixer Model for Sparse-View CT Reconstruction
* QUADify: Extracting Meshes with Pixel-Level Details and Materials from Images
* Quality Evaluation of Multi-Source Cropland Data in Alpine Agricultural Areas of the Qinghai-Tibet Plateau
* Quality-based Artifact Modeling for Facial Deepfake Detection in Videos
* Quantifying Creep on the Laohushan Fault Using Dense Continuous GNSS
* Quantifying Task Priority for Multi-Task Optimization
* Quantifying the Pabu Normal Fault Scarp, Southern Tibetan Plateau: Insights into Regional Earthquake Risk
* Quantifying Uncertainty in Motion Prediction with Variational Bayesian Mixture
* QuantNAS: Quantization-aware Neural Architecture Search For Efficient Deployment On Mobile Device
* Querying as Prompt: Parameter-Efficient Learning for Multimodal Language Model
* Question Aware Vision Transformer for Multimodal Reasoning
* Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos
* R-Cyclic Diffuser: Reductive and Cyclic Latent Diffusion for 3D Clothed Human Digitalization
* Radar Fields: An Extension of Radiance Fields to SAR
* Radar Signal Deinterleaving Method Based on Complex Network and Laplacian Graph Clustering, A
* RadarDistill: Boosting Radar-Based Object Detection Performance via Knowledge Distillation from LiDAR Features
* RadSimReal: Bridging the Gap Between Synthetic and Real Data in Radar Object Detection With Simulation
* Raising the Bar of AI-generated Image Detection with CLIP
* RAM-Avatar: Real-time Photo-Realistic Avatar from Monocular Videos with Full-body Control
* Random Entangled Tokens for Adversarially Robust Vision Transformer
* RankED: Addressing Imbalance and Uncertainty in Edge Detection Using Ranking-based Losses
* Ranking Distillation for Open-Ended Video Question Answering with Insufficient Labels
* RankMatch: Exploring the Better Consistency Regularization for Semi-Supervised Semantic Segmentation
* Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following
* Rapid 3D Model Generation with Intuitive 3D Input
* Rapid Motor Adaptation for Robotic Manipulator Arms
* RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models
* RAVN: Reinforcement Aided Adaptive Vector Quantization of Deep Neural Networks
* RBSFormer: Enhanced Transformer Network for Raw Image Super-Resolution
* RCBEVDet: Radar-Camera Fusion in Bird's Eye View for 3D Object Detection
* RCL: Reliable Continual Learning for Unified Failure Detection
* RCooper: A Real-world Large-scale Dataset for Roadside Cooperative Perception
* RDFC-GAN: RGB-Depth Fusion CycleGAN for Indoor Depth Completion
* RdmkNet and Toronto-RDMK: Large-Scale Datasets for Road Marking Classification and Segmentation
* RDPN6D: Residual-based Dense Point-wise Network for 6Dof Object Pose Estimation Based on RGB-D Images
* Re-ID-leak: Membership Inference Attacks Against Person Re-identification
* Re-Thinking Data Availability Attacks Against Deep Neural Networks
* Reactive Composition of UAV Delivery Services in Urban Environments
* Reactive Model Correction: Mitigating Harm to Task-Relevant Features via Conditional Bias Suppression
* REACTO: Reconstructing Articulated Objects from a Single Video
* READ: Retrieval-Enhanced Asymmetric Diffusion for Motion Planning
* Readout Guidance: Learning Control from Diffusion Features
* Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark
* Real-IAD: A Real-World Multi-View Dataset for Benchmarking Versatile Industrial Anomaly Detection
* Real-Time 3D-Aware Portrait Video Relighting
* Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey
* Real-Time Acquisition and Reconstruction of Dynamic Volumes with Neural Structured Illumination
* Real-Time Collision Mitigation Strategies for Autonomous Vehicles
* Real-Time Exposure Correction via Collaborative Transformations and Adaptive Sampling
* Real-Time Neural BRDF with Spherically Distributed Primitives
* Real-Time Posture Identification System for Wheelchair Users Preventing the Generation of Pressure Ulcers
* Real-Time Predictive Condition Monitoring Using Multivariate Data
* Real-Time Simulated Avatar from Head-Mounted Sensors
* Real-Time Vehicle Tracking-Based Data Forwarding Using RLS in Vehicular Named Data Networking
* Real-World Efficient Blind Motion Deblurring via Blur Pixel Discretization
* Real-World Mobile Image Denoising Dataset with Efficient Baselines
* RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization
* Realigning Confidence with Temporal Saliency Information for Point-Level Weakly-Supervised Temporal Action Localization
* RealNet: A Feature Selection Network with Realistic Synthetic Anomaly for Anomaly Detection
* RecDiffusion: Rectangling for Image Stitching with Diffusion Models
* Recent Advances for Aerial Object Detection: A Survey
* Recent advances in behavioral and hidden biometrics for personal identification
* Recent Advances in Deep Learning Model Security
* Recent Patterns and Trends of Snow Cover (2000-2023) in the Cantabrian Mountains (Spain) from Satellite Imagery Using Google Earth Engine
* Recipe for Scaling up Text-to-Video Generation with Text-free Videos, A
* Reciprocal Attention Mixing Transformer for Lightweight Image Restoration
* Recognition of Urbanized Areas in UAV-Derived Very-High-Resolution Visible-Light Imagery
* Recognize Anything: A Strong Image Tagging Model
* Recon3D: High Quality 3D Reconstruction from a Single Image Using Generated Back-View Explicit Priors
* ReconFusion: 3D Reconstruction with Diffusion Priors
* Reconstructed Variational Bayesian Kalman Filter Under Heavy-Tailed and Skewed Noises
* Reconstructing a Fine Resolution Landscape of Annual Gross Primary Product (1895-2013) with Tree-Ring Indices
* Reconstructing Hands in 3D with Transformers
* Reconstruction of Coal Mining Subsidence Field by Fusion of SAR and UAV LiDAR Deformation Data
* Reconstruction of FY-4A and FY-4B Cloudless Top-of-Atmosphere Radiation and Full-Coverage Particulate Matter Products Reveals the Influence of Meteorological Factors in Pollution Events, The
* Reconstruction-free Cascaded Adaptive Compressive Sensing
* ReCoRe: Regularized Contrastive Representation Learning of World Model
* Recurrent CNN for Online Object Detection on Raw Radar Frames, A
* Recurrent Generic Contour-Based Instance Segmentation With Progressive Learning
* Recursive Joint Cross-Modal Attention for Multimodal Fusion in Dimensional Emotion Recognition
* Red-Teaming Segment Anything Model
* Reducing Power Consumption and Latency of Autonomous Vehicles With Efficient Task and Path Assignment in the V2X-MEC Based on Nash Equilibrium
* REFA: Real-time Egocentric Facial Animations for Virtual Reality
* Reference-based GAN Evaluation by Adaptive Inversion
* Referring Expression Counting
* Referring Image Editing: Object-Level Image Editing via Referring Expressions
* Refining Biologically Inconsistent Segmentation Masks with Masked Autoencoders
* Refining Remote Photoplethysmography Architectures using CKA and Empirical Methods
* Reg-PTQ: Regression-specialized Post-training Quantization for Fully Quantized Object Detector
* ReGenNet: Towards Human Action-Reaction Synthesis
* Region-Based Representations Revisited
* Region-Specific Model Adaptation (RSMA)-Based Training Data Method for Large-Scale Land Cover Mapping, A
* Regional Adversarial Training for Better Robust Generalization
* Regional-Scale Image Segmentation of Sandy Beaches in Southeastern Australia
* RegionGPT: Towards Region Understanding Vision Language Model
* RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding
* Regressor-Segmenter Mutual Prompt Learning for Crowd Counting
* Regularized Parameter Uncertainty for Improving Generalization in Reinforcement Learning
* Reinforced Voxel-RCNN: An Efficient 3D Object Detection Method Based on Feature Aggregation
* Relation Rectification in Diffusion Model
* Relation-Aware Weight Sharing in Decoupling Feature Learning Network for UAV RGB-Infrared Vehicle Re-Identification
* Relational Experience Replay: Continual Learning by Adaptively Tuning Task-Wise Relationship
* Relational Maneuvering of Leader-Follower Unmanned Aerial Vehicles for Flexible Formation
* Relational Matching for Weakly Semi-Supervised Oriented Object Detection
* Relational Part-Aware Learning for Complex Composite Object Detection in High-Resolution Remote Sensing Images
* Relationship Between the False Alarm Probability and the Atomic Norm Denoising Regularization Parameter for Line Spectral Estimation, A
* Relationship between Vegetation and Soil Moisture Anomalies Based on Remote Sensing Data: A Semiarid Rangeland Case
* Relaxed Contrastive Learning for Federated Learning
* Relevance Pooling Guidance and Class-Balanced Feature Enhancement for Fine-Grained Oriented Object Detection in Remote Sensing Images
* RELI11D: A Comprehensive Multimodal Human Motion Dataset and Method
* Reliable Image Matching Using Optimal Combination of Color and Intensity Information Based on Relationship with Surrounding Objects
* Reliable Trajectory Prediction and Uncertainty Quantification with Conditioned Diffusion Models
* Relightable and Animatable Neural Avatar from Sparse-View Video
* Relightable Gaussian Codec Avatars
* Relightful Harmonization: Lighting-Aware Portrait Background Replacement
* Remote Sensing and Geospatial Approaches for Studying the Environment Affected by Human Activities
* Remote Sensing Detection of Growing Season Freeze-Induced Defoliation of Montane Quaking Aspen (Populus tremuloides) in Southern Utah, USA
* Remote Sensing Image Dehazing Using Multi-Scale Gated Attention for Flight Simulator
* Remote Sensing Mapping and Analysis of Spatiotemporal Patterns of Land Use and Cover Change in the Helong Region of the Loess Plateau Region (1986-2020)
* Remote Sensing of Chlorophyll-a in Clear vs. Turbid Waters in Lakes
* Remote Sensing-Based Multiscale Analysis of Total and Groundwater Storage Dynamics over Semi-Arid North African Basins
* Remote-Sensing-Based Method Using Rockfall Inventories for Hazard Mapping at the Community Scale in the Arequipa Region of Peru, A
* Remotely Piloted Aircraft for Evaluating the Impact of Frost in Coffee Plants: Interactions between Plant Age and Topography
* Remotely Sensed Comparative Spatiotemporal Analysis of Drought and Wet Periods in Distinct Mediterranean Agroecosystems
* ReMOVE: A Reference-free Metric for Object Erasure
* Rep ViT: Revisiting Mobile CNN From ViT Perspective
* RepAn: Enhanced Annealing through Re-parameterization
* Repeat and Concatenate: 2D to 3D Image Translation with 3D to 3D Generative Modeling
* RepKPU: Point Cloud Upsampling with Kernel Point Representation and Deformation
* RePoseDM: Recurrent Pose Alignment and Gradient Guidance for Pose Guided Image Synthesis
* Representation-Learning-Based Graph and Generative Network for Hyperspectral Small Target Detection, A
* Representing Part-Whole Hierarchies in Foundation Models by Learning Localizability, Composability, and Decomposability from Anatomy via Self-Supervision
* Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
* Repurposing the Image Generative Potential: Exploiting GANs to Grade Diabetic Retinopathy
* Rescaling large datasets based on validation outcomes of a pre-trained network
* Research on Design and Staged Deployment of LEO Navigation Constellation for MEO Navigation Satellite Failure
* Research on Digital Twin Method for Spaceborne Along-Track Interferometric Synthetic Aperture Radar Velocity Inversion of Ocean Surface Currents
* Research on Leaf Area Index Inversion Based on LESS 3D Radiative Transfer Model and Machine Learning Algorithms
* Research on Multiscale Atmospheric Chaos Based on Infrared Remote-Sensing and Reanalysis Data
* Research on TCN Model Based on SSARF Feature Selection in the Field of Human Behavior Recognition
* Residual Denoising Diffusion Models
* Residual Learning in Diffusion Models
* Residual-based Language Models are Free Boosters for Biomedical Imaging Tasks
* Resilient Time-Varying Formation Tracking Control for General Linear Multiagent Systems With a Nonautonomous Leader and Adversarial Followers
* Resolution Limit of Single-Photon LiDAR
* Resource Allocation of Netted Opportunistic Array Radar for Maneuvering Target Tracking under Uncertain Conditions
* Resource- Efficient Transformer Pruning for Finetuning of Large Models
* Response of NO 5.3 mu-m Emission to the Geomagnetic Storm on 24 April 2023
* Response of Upper Ocean to Parameterized Schemes of Wave Breaking under Typhoon Condition
* Restoration by Generation with Constrained Priors
* Restructuring the Teacher and Student in Self-Distillation
* Resurrecting Old Classes with New Data for Exemplar-Free Continual Learning
* Rethinking Boundary Discontinuity Problem for Oriented Object Detection
* Rethinking Diffusion Model for Multi-Contrast MRI Super-Resolution
* Rethinking Few-shot 3D Point Cloud Semantic Segmentation
* Rethinking FID: Towards a Better Evaluation Metric for Image Generation
* Rethinking Generalizable Face Anti-Spoofing via Hierarchical Prototype-Guided Distribution Refinement in Hyperbolic Space
* Rethinking Human Motion Prediction with Symplectic Integral
* Rethinking Inductive Biases for Surface Normal Estimation
* Rethinking Interactive Image Segmentation with Low Latency, High Quality, and Diverse Prompts
* Rethinking Multi-Domain Generalization with A General Learning Objective
* Rethinking Multi-View Representation Learning via Distilled Disentangling
* Rethinking Out-of-Distribution Detection From a Human-Centric Perspective
* Rethinking Prior Information Generation with CLIP for Few-Shot Segmentation
* Rethinking the Domain Gap in Near-infrared Face Recognition
* Rethinking the Evaluation Protocol of Domain Generalization
* Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis
* Rethinking the Representation in Federated Unsupervised Learning with Non-IID Data
* Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance
* Rethinking the Up-Sampling Operations in CNN-Based Generative Network for Generalizable Deepfake Detection
* Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery
* Retina: Low-Power Eye Tracking with Event Camera and Spiking Hardware
* RetinaLiteNet: A Lightweight Transformer based CNN for Retinal Feature Segmentation
* Retraining-free Model Quantization via One-Shot Weight-Coupling Learning
* Retrieval and Comparison of Multi-Satellite Polar Ozone Data from the EMI Series Instruments
* Retrieval-Augmented Egocentric Video Captioning
* Retrieval-Augmented Embodied Agents
* Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
* Retrieval-Augmented Open-Vocabulary Object Detection
* Retrospective Analysis of Municipal Geoportal Usability in the Context of the Evolution of Online Data Presentation Techniques
* Revamping Federated Learning Security from a Defender's Perspective: A Unified Defense with Homomorphic Encrypted Data Space
* revenge of BiSeNet: Efficient Multi-Task Image Segmentation, The
* Reversal of the Spatiotemporal Patterns at the End of the Growing Season of Typical Steppe Vegetation in a Semi-Arid Region by Increased Precipitation
* Reversible Data Hiding in Encrypted Images With Asymmetric Coding and Bit-Plane Block Compression
* Review and Efficient Implementation of Scene Graph Generation Metrics, A
* Review of Pakistan's National Spatial Data Infrastructure Using Multiple Assessment Frameworks, A
* Review of Satellite Remote Sensing of Carbon Dioxide Inversion and Assimilation
* Revisiting Adversarial Training at Scale
* Revisiting Adversarial Training Under Long-Tailed Distributions
* Revisiting Color Constancy Using Cnns: Including Recent Observations
* Revisiting Counterfactual Problems in Referring Expression Comprehension
* Revisiting Global Translation Estimation with Feature Tracks
* Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis
* Revisiting pre-trained remote sensing model benchmarks: resizing and normalization matters
* Revisiting Sampson Approximations for Geometric Estimation Problems
* Revisiting Single Image Reflection Removal in the Wild
* Revisiting Spatial-Frequency Information Integration from a Hierarchical Perspective for Panchromatic and Multi-Spectral Image Fusion
* Revisiting the 2017 Jiuzhaigou (Sichuan, China) Earthquake: Implications for Slip Inversions Based on InSAR Data
* Revisiting the Domain Gap Issue in Non-cooperative Spacecraft Pose Tracking
* Revisiting the Domain Shift and Sample Uncertainty in Multi-source Active Domain Transfer
* ReweightOOD: Loss Reweighting for Distance-based OOD Detection
* Rewrite the Stars
* RGB-D Cube R-CNN: 3D Object Detection with Selective Modality Dropout
* RGBD Objects in the Wild: Scaling Real-World 3D Object Learning from RGB-D Videos
* RGBT Tracking via Progressive Fusion Transformer With Dynamically Guided Learning
* Rich Human Feedback for Text-to-Image Generation
* RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D
* Riemannian Multinomial Logistics Regression for SPD Neural Networks
* Rigid Formation Control on a Sphere: A Heterogeneous System Approach
* RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation
* RIS-Assisted Multi-Aperture FSO Communication Network for High-Speed Train: Second-Order Statistical Analysis
* RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-Grained Correctional Human Feedback
* RLNet: Robust Linearized Networks for Efficient Private Inference
* RMem: Restricted Memory Banks Improve Video Object Segmentation
* RMS-FlowNet++: Efficient and Robust Multi-scale Scene Flow Estimation for Large-Scale Point Clouds
* RMT: Retentive Networks Meet Vision Transformers
* RNb-NeuS: Reflectance and Normal-Based Multi-View 3D Reconstruction
* RNN for Temporal Consistency in Low-Light Videos Enhanced by Single-Frame Methods, A
* Road Network Intelligent Selection Method Based on Heterogeneous Graph Attention Neural Network
* Road Object Detection Robust to Distorted Objects at the Edge Regions of Images
* Robotic Grasp Detection Using Structure Prior Attention and Multiscale Features
* Robust Adaptive Leader-Following Formation Control of Nonlinear Multiagents Using Three-Layer Neural Networks
* Robust and Explainable Fine-Grained Visual Classification with Transfer Learning: A Dual-Carriageway Framework
* Robust clustering algorithm: The use of soft trimming approach
* Robust Color Image Hashing With Nonnegative Matrix Factorization and Saliency Map for Copy Detection
* Robust Data Augmentation and Ensemble Method for Object Detection in Fisheye Camera Images
* Robust Depth Enhancement via Polarization Prompt Fusion Tuning
* Robust Direction Estimation of Terrestrial Signal via Sparse Non-Uniform Array Reconfiguration under Perturbations
* Robust Disaster Assessment from Aerial Imagery Using Text-to-Image Synthetic Data
* Robust Distillation via Untargeted and Targeted Intermediate Adversarial Samples
* Robust Emotion Recognition in Context Debiasing
* Robust Feature Matching via Graph Neighborhood Motion Consensus
* Robust Formation Tracking Control for Noncooperative Heterogeneous Multiagent Systems
* Robust Hashing for Neural Network Models via Heterogeneous Graph Representation
* Robust Image Denoising Through Adversarial Frequency Mixup
* Robust Image Registration for Power Equipment Using Large-Gap Fracture Contours
* Robust Longitudinal Control for Vehicular Platoons Using Deep Reinforcement Learning
* Robust Motorcycle Helmet Detection in Real-World Scenarios: Using Co-DETR and Minority Class Enhancement
* Robust Noisy Correspondence Learning with Equivariant Similarity Consistency
* Robust Object Selection in Spontaneous Gaze-Controlled Application Using Exponential Moving Average and Hidden Markov Model
* Robust Online Multi-Camera People Tracking System With Geometric Consistency and State-aware Re-ID Correction, A
* Robust Overfitting Does Matter: Test-Time Adversarial Purification with FGSM
* Robust Perspective-n-Crater for Crater-based Camera Pose Estimation
* Robust RIS-Based DOA Estimation With Mixed Constraints
* Robust Self-Calibration of Focal Lengths from the Fundamental Matrix
* Robust Synthetic-to-Real Transfer for Stereo Matching
* Robust Trajectory and Resource Allocation for UAV Communications in Uncertain Environments With No-Fly Zone: A Deep Learning Approach
* Robust Transit Frequency Setting Problem With Demand Uncertainty
* Robust Translational Motion Compensation Method for Moving Target ISAR Imaging Based on Phase Difference-Lv's Distribution and Auto-Cross-Correlation Algorithm, A
* Robust Underwater Direction-of-Arrival Estimation Method Using Acoustic Sensor Array under Unknown Swing Deviation Elements
* Robust Unpaired Image Dehazing via Adversarial Deformation Constraint
* Robustness Analysis on Foundational Segmentation Models
* RobustSAM: Segment Anything Robustly on Degraded Images
* RoDLA: Benchmarking the Robustness of Document Layout Analysis Models
* RoHM: Robust Human Motion Reconstruction via Diffusion
* Rolling Shutter Correction with Intermediate Distortion Flow Estimation
* RoMa: Robust Dense Feature Matching
* RoMo: Robust Unsupervised Multimodal Learning With Noisy Pseudo Labels
* Root Cause Analysis of a Self-Driving Car Dragging a Pedestrian, A
* Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation
* Rotation-Agnostic Image Representation Learning for Digital Pathology
* Rotational Motion Compensation for ISAR Imaging Based on Minimizing the Residual Norm
* RS-2-BP: A Unified Deep Learning Framework for Deriving EIT-Based Breathing Patterns From Respiratory Sounds
* RSO-SLAM: A Robust Semantic Visual SLAM With Optical Flow in Complex Dynamic Environments
* RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation
* RTracker: Recoverable Tracking via PN Tree Structured Memory
* Rugby Scene Classification Enhanced by Vision Language Model
* Run-time Monitoring of 3D Object Detection in Automated Driving Systems Using Early Layer Neural Activation Patterns
* RUN: Rethinking the UNet Architecture for Efficient Image Restoration
* S-DyRF: Reference-Based Stylized Radiance Fields for Dynamic Scenes
* S2MAE: A Spatial-Spectral Pretraining Foundation Model for Spectral Remote Sensing Data
* S2MVTC: A Simple Yet Efficient Scalable Multi-View Tensor Clustering
* S3R-Net: A Single-Stage Approach to Self-Supervised Shadow Removal
* SA-SatMVS: Slope Feature-Aware and Across-Scale Information Integration for Large-Scale Earth Terrain Multi-View Stereo
* SaCo Loss: Sample-Wise Affinity Consistency for Vision-Language Pre-Training
* SACReg: Scene-Agnostic Coordinate Regression for Visual Localization
* SAD-GS: Shape-aligned Depth-supervised Gaussian Splatting
* SADCMF: Self-Attentive Deep Consistent Matrix Factorization for Micro-Video Multi-Label Classification
* SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection
* Safe Reinforcement Learning in Autonomous Driving With Epistemic Uncertainty Estimation
* SAI3D: Segment any Instance in 3D Scenes
* Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement
* Saliency-based video summarization for face anti-spoofing
* Salient Object-Aware Background Generation using Text-Guided Diffusion Models
* Salp Swarm Algorithm-Based Kalman Filter for Seamless Multi-Source Fusion Positioning with Global Positioning System/Inertial Navigation System/Smartphones
* SAM-6D: Segment Anything Model Meets Zero-Shot 6D Object Pose Estimation
* SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
* SAM-PM: Enhancing Video Camouflaged Object Detection using Spatio-Temporal Attention
* SANeRF-HQ: Segment Anything for NeRF in High Quality
* SAOR: Single-View Articulated Object Reconstruction
* SAR-NTV-YOLOv8: A Neural Network Aircraft Detection Method in SAR Images Based on Despeckling Preprocessing
* Sat2Cap: Mapping Fine-Grained Textual Descriptions from Satellite Images
* Sat2Scene: 3D Urban Scene Generation from Satellite Images with Diffusion
* Satellite Remote Sensing Grayscale Image Colorization Based on Denoising Generative Adversarial Network
* Satellite-Based Detection of Algal Blooms in Large Alpine Lake Sevan: Can Satellite Data Overcome the Unavoidable Limitations in Field Observations?
* SatSynth: Augmenting Image-Mask Pairs Through Diffusion Models for Aerial Semantic Segmentation
* SA^3WT: Adaptive Wavelet-Based Transformer with Self-Paced Auto Augmentation for Face Forgery Detection
* SC- Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models
* SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes
* Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering
* Scalable 3D Registration via Truncated Entry-Wise Absolute Residuals
* Scalable Algorithms for Bicriterion Trip-Based Transit Routing
* Scalable Deep Color Quantization: A Cluster Imitation Approach
* Scale Decoupled Distillation
* Scale- and Resolution-Adapted Shaded Relief Generation Using U-Net
* Scale-aware token-matching for transformer-based object detector
* Scaling Diffusion Models to Real-World 3D LiDAR Scene Completion
* Scaling Graph Convolutions for Mobile Vision
* Scaling Laws for Data Filtering: Data Curation Cannot be Compute Agnostic
* Scaling Laws of Synthetic Images for Model Training ... for Now
* Scaling Up Dynamic Human-Scene Interaction Modeling
* Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
* Scaling Up Video Summarization Pretraining with Large Language Models
* ScanFormer: Referring Expression Comprehension by Iteratively Scanning
* Scattering Prompt Tuning: A Fine-tuned Foundation Model for SAR Object Recognition
* SCE-MAE: Selective Correspondence Enhancement with Masked Autoencoder for Self-Supervised Landmark Estimation
* SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing
* Scene Adaptive Sparse Transformer for Event-based Object Detection
* Scene-adaptive and Region-aware Multi-modal Prompt for Open Vocabulary Object Detection
* SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes
* SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors
* SchurVINS: Schur Complement-Based Lightweight Visual Inertial Navigation System
* Scientometric Analysis of Quantum Algorithms for VANET Optimization
* SciFlow: Empowering Lightweight Optical Flow Models with Self-Cleaning Iterations
* SCINeRF: Neural Radiance Fields from a Snapshot Compressive Image
* SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation
* Score-Guided Diffusion for 3D Human Recovery
* ScoreHypo: Probabilistic Human Mesh Estimation with Hypothesis Scoring
* Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior
* SCULPT: Shape-Conditioned Unpaired Learning of Pose-dependent Clothed and Textured Human Meshes
* Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-Training
* SD-DiT: Unleashing the Power of Self-Supervised Discrimination in Diffusion Transformer*
* SD2Event: Self-Supervised Learning of Dynamic Detectors and Contextual Descriptors for Event Cameras
* SDCNet:Spatially-Adaptive Deformable Convolution Networks for HR NonHomogeneous Dehazing
* SDDGR: Stable Diffusion-Based Deep Generative Replay for Class Incremental Object Detection
* SDFConnect: Neural Implicit Surface Reconstruction of a Sparse Point Cloud with Topological Constraints
* SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation
* SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking
* Sea-Land Segmentation of Remote-Sensing Images with Prompt Mask-Attention
* SeaBird: Segmentation in Bird's View with Dice Loss Improves Monocular 3D Detection of Large Objects
* Seamless Human Motion Composition with Blended Positional Encodings
* Seamless Weather Data Integration in Trajectory-Based Operations Utilizing Geospatial Information
* SEAS: ShapE-Aligned Supervision for Person Re-Identification
* Second Edition FRCSyn Challenge at CVPR 2024: Face Recognition Challenge in the Era of Synthetic Data
* SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation
* Secure Spatio-Temporal Chaotic Pseudorandom Generator for Image Encryption, A
* Security Protocol for Vehicle Platoon Verification Using Optical Camera Communications, A
* SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation
* SeD: Semantic-Aware Discriminator for Image Super-Resolution
* See, Say, and Segment: Teaching LMMs to Overcome False Premises
* SEED-Bench: Benchmarking Multimodal Large Language Models
* Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
* Seeing Motion at Nighttime with an Event Camera
* Seeing the Unseen: Visual Common Sense for Semantic Placement
* Seeing the Vibration from Fiber-Optic Cables: Rain Intensity Monitoring using Deep Frequency Filtering
* Seeing the World through Your Eyes
* Seeing Unseen: Discover Novel Biomedical Concepts via Geometry-Constrained Probabilistic Modeling
* SeeSR: Towards Semantics-Aware Real-World Image Super-Resolution
* Seg2Reg: Differentiable 2D Segmentation to 1D Regression Rendering for 360 Room Layout Reconstruction
* SegFormer3D: an Efficient Transformer for 3D Medical Image Segmentation
* Segment and Caption Anything
* Segment Any Event Streams via Weighted Adaptation of Pivotal Tokens
* Segment Anything in Food Images
* Segment Anything Model Combined with Multi-Scale Segmentation for Extracting Complex Cultivated Land Parcels in High-Resolution Remote Sensing Images
* Segment Anything Model for Road Network Graph Extraction
* Segment Every Out-of-Distribution Object
* Segmentation and Completion of Human Motion Sequence via Temporal Learning of Subspace Variety Model
* Segmentation-Free Guidance for Text-to-Image Diffusion Models
* Segmentation-Free Velocity Field Super-Resolution on 4D Flow MRI
* SeIF: Semantic-Constrained Deep Implicit Function for Single-Image 3D Head Reconstruction
* Seismic Imaging of the Arctic Subsea Permafrost Using a Least-Squares Reverse Time Migration Method
* Selecting Erosion- and Deposition-Dominated Zones in the Jezero Delta Using a Water Flow Model for Targeting Future In Situ Mars Surface Missions
* Selective Hourglass Mapping for Universal Image Restoration Based on Diffusion Model
* Selective Multi-View Deep Model for 3D Object Classification
* Selective Nonlinearities Removal from Digital Signals
* Selective, Interpretable and Motion Consistent Privacy Attribute Obfuscation for Action Recognition
* Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching
* Selectively Dilated Convolution for Accuracy-Preserving Sparse Pillar-based Embedded 3D Object Detection
* Selectively Informative Description can Reduce Undesired Embedding Entanglements in Text-to-Image Personalization
* Self-Adaptive Reality-Guided Diffusion for Artifact-Free Super-Resolution
* Self-Attention Progressive Network for Infrared and Visible Image Fusion
* Self-Calibrating Vicinal Risk Minimisation for Model Calibration
* Self-Calibration Flow Guided Denoising Diffusion Model for Human Pose Transfer
* Self-Correcting LLM-Controlled Diffusion Models
* Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation
* Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors
* Self-Similarity Prior Distillation for Unsupervised Remote Physiological Measurement
* Self-Supervised Class-Agnostic Motion Prediction with Spatial and Temporal Consistency Regularizations
* Self-Supervised Debiasing Using Low Rank Regularization
* Self-supervised Domain Adaptation with Significance-Oriented Masking for Pelvic Organ Prolapse detection
* Self-Supervised Dual Contouring
* Self-Supervised Facial Representation Learning with Facial Region Awareness
* Self-Supervised Learning across the Spectrum
* Self-Supervised Learning with Generative Adversarial Networks for Electron Microscopy
* Self-supervised multi-echo point cloud denoising in snowfall
* Self-Supervised Multi-Object Tracking with Path Consistency
* Self-Supervised Representation Learning from Arbitrary Scenarios
* Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement
* SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction
* SelfPose3d: Self-Supervised Multi-Person Multi-View 3d Pose Estimation
* Semantic Controllable Long Text Steganography Framework Based on LLM Prompt Engineering and Knowledge Graph, A
* Semantic Face Compression for Metaverse: A Compact 3D Descriptor Based Approach
* semantic guidance-based fusion network for multi-label image classification, A
* Semantic Human Mesh Reconstruction with Textures
* Semantic Line Combination Detector
* Semantic Mapping of Landscape Morphologies: Tuning ML/DL Classification Approaches for Airborne LiDAR Data
* Semantic Pre-supplement for Exposure Correction
* Semantic Progressive Guidance Network for RGB-D Mirror Segmentation
* Semantic Segmentation and Classification of Active and Abandoned Agricultural Fields through Deep Learning in the Southern Peruvian Andes
* Semantic Segmentation-Driven Integration of Point Clouds from Mobile Scanning Platforms in Urban Environments
* Semantic Shield: Defending Vision-Language Models Against Backdooring and Poisoning via Fine-Grained Knowledge Alignment
* Semantic-aware hyper-space deformable neural radiance fields for facial avatar reconstruction
* Semantic-Aware Message Broadcasting for Efficient Unsupervised Domain Adaptation
* Semantic-Aware Multi-Label Adversarial Attacks
* Semantic-aware SAM for Point-Prompted Instance Segmentation
* Semantic-Enhanced Proxy-Guided Hashing for Long-Tailed Image Retrieval
* Semantically-Shifted Incremental Adapter-Tuning is A Continual ViTransformer
* Semantics, Distortion, and Style Matter: Towards Source-Free UDA for Panoramic Segmentation
* Semantics-Aware Motion Retargeting with Vision-Language Models
* SemCity: Semantic Scene Generation with Triplane Diffusion
* Semi-Stereo: A Universal Stereo Matching Framework for Imperfect Data via Semi-supervised Learning
* Semi-Supervised Crowd Counting With Contextual Modeling: Facilitating Holistic Understanding of Crowd Scenes
* Semi-Supervised Multitask Learning Using Gaze Focus for Gaze Estimation
* Semi-Supervised Nighttime Dehazing Baseline with Spatial-Frequency Aware and Realistic Brightness Constraint, A
* Semi-Supervised Remote Sensing Building Change Detection with Joint Perturbation and Feature Complementation
* Semi-Supervised Subcategory Centroid Alignment-Based Scene Classification for High-Resolution Remote Sensing Images
* SemiGPC: Distribution-Aware Label Refinement for Imbalanced Semi-Supervised Learning Using Gaussian Processes
* SeMoLi: What Moves Together Belongs Together
* SeNM-VAE: Semi-Supervised Noise Modeling with Hierarchical Variational Autoencoder
* Sensitivity-Aware Density Estimation in Multiple Dimensions
* Sensor Equivariance: A Framework for Semantic Segmentation with Diverse Camera Models
* Sentinel-Guided Zero-Shot Learning: A Collaborative Paradigm Without Real Data Exposure
* Separate and Conquer: Decoupling Co-occurrence via Decomposition and Representation for Weakly Supervised Semantic Segmentation
* Separating lungs in CT scans for improved COVID19 detection
* Separating the Chirp from the Chat: Self-supervised Visual Grounding of Sound and Language
* Sequential Modeling Enables Scalable Learning for Large Vision Models
* SETA: Semantic-Aware Edge-Guided Token Augmentation for Domain Generalization
* Severity Detection of Diabetic Retinopathy: A Review
* SF-IQA: Quality and Similarity Integration for AI Generated Image Quality Assessment
* SfM Photogrammetry for Cost-Effective 3D Documentation and Rock Art Analysis of the Dombate Dolmen (Spain) and the Megalithic Sites of Chã dos Cabanos and Chã da Escusalha (Portugal)
* SfmCAD: Unsupervised CAD Reconstruction by Learning Sketch-based Feature Modeling Operations
* SFOD: Spiking Fusion Object Detector
* SG-BEV: Satellite-Guided BEV Fusion for Cross-View Semantic Segmentation
* SG-PGM: Partial Graph Matching Network with Semantic Geometric Fusion for 3D Scene Graph Alignment and its Downstream Tasks
* SGDM: An Adaptive Style-Guided Diffusion Model for Personalized Text to Image Generation
* Shadow Generation for Composite Image Using Diffusion Model
* Shadow Removal based on Diffusion, Segmentation and Super-resolution Models
* Shadow Removal via Global Residual Free Unet and Shadow Generation
* Shadow-Enlightened Image Outpainting
* ShadowRefiner: Towards Mask-free Shadow Removal via Fast Fourier Transformer
* Shadows Don't Lie and Lines Can't Bend! Generative Models Don't know Projective Geometry ... for Now
* Shallow-Deep Collaborative Learning for Unsupervised Visible-Infrared Person Re-Identification
* Shap-Editor: Instruction-guided Latent 3D Editing in Seconds
* Shape-Preserving Generation of Food Images for Automatic Dietary Assessment
* ShapeMatcher: Self-Supervised Joint Shape Canonicalization, Segmentation, Retrieval and Deformation
* ShapeWalk: Compositional Shape Editing Through Language-Guided Chains
* Sharingan: A Transformer Architecture for Multi-Person Gaze Following
* sharper definition of alignment for Panoptic Quality, A
* Sharpness-Aware Optimization for Real-World Adversarial Attacks for Diverse Compute Platforms with Enhanced Transferability
* Sheared Backpropagation for Fine-Tuning Foundation Models
* Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior
* ShiftAddAug: Augment Multiplication-Free Tiny Neural Network with Hybrid Computation
* SHiNe: Semantic Hierarchy Nexus for Open-Vocabulary Object Detection
* SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild
* Ship Contour Extraction from Polarimetric SAR Images Based on Polarization Modulation
* Ship Grid: A Novel Anchor-Free Ship Detection Algorithm
* Ship Wake Detection in a Single SAR Image via a Modified Low-Rank Constraint
* Short-form UGC Video Quality Assessment Based on Multi-Level Video Fusion with Rank-Aware
* Show, Think, and Tell: Thought-Augmented Fine-Tuning of Large Language Models for Video Captioning
* SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design
* SI-MIL: Taming Deep MIL for Self-Interpretability in Gigapixel Histopathology
* Siamese InternImage for Change Detection
* Siamese Learning with Joint Alignment and Regression for Weakly-Supervised Video Paragraph Grounding
* Siamese Neural Network Framework With Sememe-Based Context Extraction for Interactive Argument Pair Identification, A
* Sieve: Multimodal Dataset Pruning Using Image Captioning Models
* SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction
* SIGNeRF: Scene Integrated Generation for Neural Radiance Fields
* SignGraph: A Sign Sequence is Worth Graphs of Nodes
* SIM-OFE: Structure Information Mining and Object-Aware Feature Enhancement for Fine-Grained Visual Categorization
* SimAC: A Simple Anti-Customization Method for Protecting Face Privacy Against Text-to-Image Synthesis of Diffusion Models
* SimDA: Simple Diffusion Adapter for Efficient Video Generation
* Simple and Effective Point-Based Network for Event Camera 6-DOFs Pose Relocalization, A
* Simple Baseline for Efficient Hand Mesh Reconstruction, A
* Simple In-place Data Augmentation for Surveillance Object Detection
* Simple Recipe for Contrastively Pre-Training Video-First Encoders Beyond 16 Frames, A
* Simple Recipe for Language-Guided Domain Generalized Segmentation, A
* Simple Semantic-Aided Few-Shot Learning
* SimpliCity: Reconstructing Buildings with Simple Regularized 3D Models
* Simulation and Forecast of Coastal Ecosystem Services in Jiaodong Peninsula Based on SSP-RCP Scenarios
* Simulation Platform for Truck Platooning Evaluation in an Interactive Traffic Environment, A
* Simultaneous Temperature Estimation and Nonuniformity Correction From Multiple Frames
* Single Domain Generalization for Crowd Counting
* Single image dehazing based on multi-label graph cuts
* Single Mesh Diffusion Models with Field Latents for Texture Generation
* Single View Refractive Index Tomography with Neural Fields
* Single-Model and Any-Modality for Video Object Tracking
* Single-Shot and Multi-Shot Feature Learning for Multi-Object Tracking
* Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation
* Single-View Scene Point Cloud Human Grasp Generation
* SingularTrajectory: Universal Trajectory Predictor Using Diffusion Model
* SinSR: Diffusion-Based Image Super-Resolution in a Single Step
* SIRA: Scalable Inter-Frame Relation and Association for Radar Perception
* SiT-MLP: A Simple MLP With Point-Wise Topology Feature Learning for Skeleton-Based Action Recognition
* SiTH: Single-view Textured Human Reconstruction with Image-Conditioned Diffusion
* Situation Monitor: Diversity-Driven Zero-Shot Out-of-Distribution Detection using Budding Ensemble Architecture for Object Detection
* Situational Awareness Matters in 3D Vision Language Reasoning
* Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning
* Sketch-guided Image Inpainting with Partial Discrete Diffusion Process
* SketchINR: A First Look into Sketches as Implicit Neural Representations
* SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution
* SkipPLUS: Skip the First Few Layers to Better Explain Vision Transformers
* SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery
* SLAIM: Robust Dense Neural SLAM for Online Tracking and Mapping
* SleepVST: Sleep Staging from Near-Infrared Video Signals using Pre-Trained Transformers
* Slice3D: Multi-Slice, Occlusion-Revealing, Single View 3D Reconstruction
* SLICE: Stabilized LIME for Consistent Explanations for Image Classification
* SlowFormer: Adversarial Attack on Compute and Energy Consumption of Efficient Vision Transformers
* SMALE: Hyperspectral Image Classification via Superpixels and Manifold Learning
* Small Fixed-Wing Unmanned Aerial Vehicle Path Following Under Low Altitude Wind Shear Disturbance
* Small Scale Data-Free Knowledge Distillation
* Small Steps and Level Sets: Fitting Neural Surface Models with Point Guidance
* Smart Contract-Based Decentralized Data Sharing and Content Delivery for Intelligent Connected Vehicles in Edge Computing
* Smart Help: Strategic Opponent Modeling for Proactive and Adaptive Robot Assistance in Households
* SmartEdit: Exploring Complex Instruction-Based Image Editing with Multimodal Large Language Models
* SmartMask: Context Aware High-Fidelity Mask Generation for Fine-grained Object Insertion and Layout Control
* SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion Prediction
* Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
* SnAG: Scalable and Accurate Video Grounding
* Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
* Snapshot Lidar: Fourier Embedding of Amplitude and Phase for Single-Image Depth Reconstruction
* Snapshot Spectral Imaging for Face Anti-Spoofing: Addressing Data Challenges with Advanced Processing and Training
* SNED: Superposition Network Architecture Search for Efficient Video Diffusion Model
* SNI-SLAM: Semantic Neural Implicit SLAM
* SNIDA: Unlocking Few-Shot Object Detection with Non-Linear Semantic Decoupling Augmentation
* Sniffer: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection
* Snow Cover Extraction from Landsat 8 OLI Based on Deep Learning with Cross-Scale Edge-Aware and Attention Mechanism
* SOAC: Spatio-Temporal Overlap-Aware Multi-Sensor Calibration using Neural Radiance Fields
* SoccerNet Game State Reconstruction: End-to-End Athlete Tracking and Identification on a Minimap
* SoccerNet-Depth: a Scalable Dataset for Monocular Depth Estimation in Sports Videos
* Social-Aware Assisted Edge Collaborative Caching Based on Deep Reinforcement Learning Joint With Digital Twin Network in Internet of Vehicles
* SocialCircle: Learning the Angle-based Social Interaction Representation for Pedestrian Trajectory Prediction
* SocialCounterfactuals: Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples
* SODA: Bottleneck Diffusion Models for Representation Learning
* SoftAct: A High-Precision Softmax Architecture for Transformers Supporting Nonlinear Functions
* Soften to Defend: Towards Adversarial Robustness via Self-Guided Label Refinement
* Soil Moisture-Derived SWDI at 30 m Based on Multiple Satellite Datasets for Agricultural Drought Monitoring
* Soil Salinity Mapping of Plowed Agriculture Lands Combining Radar Sentinel-1 and Optical Sentinel-2 with Topographic Data in Machine Learning Models
* SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge
* Solar Cycle Dependence of Migrating Diurnal Tide in the Equatorial Mesosphere and Lower Thermosphere
* Solving Masked Jigsaw Puzzles with Diffusion Vision Transformers*
* Solving the Catastrophic Forgetting Problem in Generalized Category Discovery
* Sonic VisionLM: Playing Sound with Vision Language Models
* SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos
* Source Range Estimation Using Linear Frequency-Difference Matched Field Processing in a Shallow Water Waveguide
* Source-free Domain Adaptation for Video Object Detection Under Adverse Image Conditions
* Source-Free Domain Adaptation of Weakly-Supervised Object Localization Models for Histology
* Source-Free Domain Adaptation with Frozen Multimodal Foundation Model
* Space Non-Cooperative Target Recognition Method for Multi-Satellite Cooperative Observation Systems, A
* Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer
* Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis
* SPAD: Spatially Aware Multi-View Diffusers
* Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning
* Sparse Global Matching for Video Frame Interpolation with Large Motion
* Sparse multi-view hand-object reconstruction for unseen environments
* Sparse Semi-DETR: Sparse Learnable Queries for Semi-Supervised Object Detection
* Sparse Views, Near Light: A Practical Paradigm for Uncalibrated Point-Light Photometric Stereo
* SparseOcc: Rethinking Sparse Latent Representation for Vision-Based Semantic Occupancy Prediction
* Sparsity-Based Adaptive Beamforming for Coherent Signals With Polarized Sensor Arrays
* Sparsity-Enhanced Constrained Least-Squares Spectral Analysis with Greedy-FISTA
* Spatial Adaptive Filter Network With Scale-Sharing Convolution for Image Demoiréing
* Spatial and Temporal Dynamics in Vegetation Greenness and Its Response to Climate Change in the Tarim River Basin, China
* Spatial and Temporal Variations of Total Suspended Matter Concentration during the Dry Season in Dongting Lake in the Past 35 Years
* Spatial Feature-Based ISAR Image Registration for Space Targets
* Spatial-Aware Regression for Keypoint Localization
* Spatial-Frequency Feature Fusion Network for Lightweight and Arbitrary-Sized JPEG Steganalysis
* Spatial-Frequency Fusion for Bayer Demosaicking
* Spatial-Frequency-Based Selective Fixed-Filter Algorithm for Multichannel Active Noise Control
* Spatial-State-Based Omni-Directional Collision Warning System for Intelligent Vehicles, A
* Spatial-Temporal Analysis of the Effects of Frost and Temperature on Vegetation in the Third Pole Based on Remote Sensing
* Spatial-Temporal Assessment of Eco-Environment Quality with a New Comprehensive Remote Sensing Ecological Index (CRSEI) Based on Quaternion Copula Function
* Spatially-Adaptive Large-Kernel Network for Efficient Image Super-Resolution
* SpatialTracker: Tracking Any 2D Pixels in 3D Space
* SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities
* Spatio-Temporal Attention and Gaussian Processes for Personalized Video Gaze Estimation
* Spatio-Temporal Corridor-Based Motion Planning of Lane Change Maneuver for Autonomous Driving in Multi-Vehicle Traffic
* Spatio-Temporal Multi-Image Reflection Removal
* Spatio-Temporal Turbulence Mitigation: A Translational Perspective
* Spatiotemporal Climatic Signal Denoising Based on Spatiotemporal Variability Index
* Spatiotemporal Feature Fusion for Video Summarization
* Spatiotemporal Orthogonal Projection Capsule Network for Incremental Few-Shot Action Recognition
* Spatiotemporal Prediction of Conflict Fatality Risk Using Convolutional Neural Networks and Satellite Imagery
* Spatiotemporal Variations and Characteristics of CO, H2CO and HCN Emissions from Biomass Burning Monitored by FTIR Spectroscopy
* SPECAT: SPatial-spEctral Cumulative-Attention Transformer for High-Resolution Hyperspectral Image Reconstruction
* SpecNeRF: Gaussian Directional Encoding for Specular Reflections
* Spectral and Polarization Vision: Spectro-polarimetric Real-world Dataset
* Spectral Calibration of the Spectrometer on Board the Colombian FACSAT-2 Satellite Mission
* Spectral Imaging Methods for Estimating Fluorescence Emission Spectra from Plant Grains and Leaves
* Spectral Meets Spatial: Harmonising 3D Shape Matching and Interpolation
* Spectrum AUC Difference (SAUCD): Human-Aligned 3D Shape Evaluation
* Specularity Factorization for Low-Light Enhancement
* Speech2UnifiedExpressions: Synchronous Synthesis of Co-Speech Affective Face and Body Expressions from Affordable Inputs
* Speed-Improved Off-Focus Imaging Technique for Real-Aperture Imaging System Based on Wavenumber Spectrum Fusion
* Spherical Magnetic Vector Forwarding of Isoparametric DGGS Cells with Natural Superconvergent Points
* Spherical Mask: Coarse-to-Fine 3D Point Cloud Instance Segmentation with Spherical Representation
* SpiderMatch: 3D Shape Matching with Global Optimality and Geometric Consistency
* SPIDeRS: Structured Polarization for Invisible Depth and Reflectance Sensing
* Spike-guided Motion Deblurring with Unknown Modal Spatiotemporal Alignment
* SpikeNeRF: Learning Neural Radiance Fields from Continuous Spike Stream
* SpikingResformer: Bridging ResNet and Vision Transformer in Spiking Neural Networks
* Spin-UP: Spin Light for Natural Light Uncalibrated Photometric Stereo
* SPIN: Simultaneous Perception, Interaction and Navigation
* SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM
* SplatPose & Detect: Pose-Agnostic 3D Anomaly Detection
* Splatter Image: Ultra-Fast Single-View 3D Reconstruction
* SplattingAvatar: Realistic Real-Time Human Avatars With Mesh-Embedded Gaussian Splatting
* Split Computing With Scalable Feature Compression for Visual Analytics on the Edge
* Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation
* SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World
* SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos
* SPOT: Self-Training with Patch-Order Permutation for Object-Centric Learning with Autoregressive Transformers
* SPU-PMD: Self-Supervised Point Cloud Upsampling via Progressive Mesh Deformation
* SR-Adv: Salient Region Adversarial Attacks on 3D Point Clouds for Autonomous Driving
* SRNSD: Structure-Regularized Night-Time Self-Supervised Monocular Depth Estimation for Outdoor Scenes
* SRTube: Video-Language Pre-Training with Action-Centric Video Tube Features and Semantic Role Labeling
* SSBAS-InSAR: A Spatially Constrained Small Baseline Subset InSAR Technique for Refined Time-Series Deformation Monitoring
* SSN: Scale Selection Network for Multi-Scale Object Detection in Remote Sensing Images
* SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation
* ST-Gait++: Leveraging spatio-temporal convolutions for gait-based emotion recognition on videos
* ST2ST: Self-Supervised Test-time Adaptation for Video Action Recognition
* Stability and Safety Analysis of Connected and Automated Vehicle Platoon Considering Dynamic Communication Topology
* Stabilization of the Spectral Power Distribution of a Tunable Multichannel Led Lighting System
* Stable Neighbor Denoising for Source-free Domain Adaptive Segmentation
* Stable VITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On
* STADet: Streaming Timing-Aware Video Lane Detection
* StampOne: Addressing Frequency Balance in Printer-proof Steganography
* STARRIS-Assisted IoV NOMA Networks With Hardware Impairments and Imperfect CSI
* State Space Models for Event Cameras
* State-Space Modeling and Feedback Control for Real-Time Automatic Train Timetable Rescheduling of Intercity HSRs
* Stationary Representations: Optimally Approximating Compatibility and Implications for Improved Model Replacements
* STD-Net: Spatio-Temporal Decomposition Network for Video Demoiréing With Sparse Transformers
* Stealthy Wrongdoer: Feature-Oriented Reconstruction Attack Against Split Learning, A
* Steerers: A Framework for Rotation Equivariant Keypoint Descriptors
* StegaNeRV: Video Steganography using Implicit Neural Representation
* Steganographic Passport: An Owner and User Verifiable Credential for Deep Model IP Protection Without Retraining
* StegoGAN: Leveraging Steganography for Non-Bijective Image-to-Image Translation
* Step Differences in Instructional Video
* StereoDiffusion: Training-Free Stereo Image Generation Using Latent Diffusion Models
* Stitching from Spectral Filter Array Video Sequences
* Stochastic Predictive Adaptive Cruise Control System With Uncertainty-Aware Velocity Prediction and Parameter Self-Learning, A
* STORM: A Spatio-Temporal Context-Aware Model for Predicting Event-Triggered Abnormal Crowd Traffic
* StraightPCF: Straight Point Cloud Filtering
* Strategic Framework for Establishing Additional In Situ Data Acquisition Sites for Satellite Data Calibration and Validation: A Case Study in South Korean Forests, A
* Strategies to Improve Real-World Applicability of Laparoscopic Anatomy Segmentation Models
* Strategies to Leverage Foundational Model Knowledge in Object Affordance Grounding
* Stratified Avatar Generation from Sparse Observations
* Streaming Dense Video Captioning
* StreamingFlow: Streaming Occupancy Forecasting with Asynchronous Multi-modal Data Streams via Neural Ordinary Differential Equation
* Street) Lights Will Guide You: Georeferencing Nighttime Astronaut Photography of Earth
* stroke of genius: Predicting the next move in badminton, A
* StrokeFaceNeRF: Stroke-Based Facial Appearance Editing in Neural Radiance Field
* Strong Transferable Adversarial Attacks via Ensembled Asymptotically Normal Distribution Learning
* Stronger Impact of Extreme Heat Event on Vegetation Temperature Sensitivity under Future Scenarios with High-Emission Intensity
* Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation
* Structure Matters: Tackling the Semantic Discrepancy in Diffusion Models for Image Inpainting
* Structure-Aware Sparse-View X-Ray 3D Reconstruction
* Structure-Guided Adversarial Training of Diffusion Models
* Structured Gradient-Based Interpretations via Norm-Regularized Adversarial Training
* Structured Model Probing: Empowering Efficient Transfer Learning by Structured Regularization
* Structured Sparse Back-propagation for Lightweight On-Device Continual Learning on Microcontroller Units
* Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition, A
* study of object recognition and tracking techniques for augmented reality applications, A
* Study of the Mixed Layer Warming Induced by the Barrier Layer in the Northern Bay of Bengal in 2013, A
* Study of TMA Aircraft Conflict-Free Routing and Operation: With Mixed Integer Linear Programming, Multi-Agent Path Finding, and Metaheuristic-Based Neighborhood Search, A
* Study on the Relationship between Groundwater and Land Subsidence in Bangladesh Combining GRACE and InSAR
* Study on the Spatiotemporal Distribution and Usage Pattern of Dockless Shared Bicycles: The Case of Nanjing, A
* STVchrono Dataset: Towards Continuous Change Recognition in Time, The
* Style Aligned Image Generation via Shared Attention
* Style Blind Domain Generalized Semantic Segmentation via Covariance Alignment and Semantic Consistence Contrastive Learning
* Style Injection in Diffusion: A Training-Free Approach for Adapting Large-Scale Diffusion Models for Style Transfer
* Style Transfer for 2D Talking Head Generation
* StyleCineGAN: Landscape Cinemagraph Generation Using a Pre-trained StyleGAN
* StyLitGAN: Image-Based Relighting via Latent Control
* Subjective and Objective Quality Assessment of Rendered Human Avatar Videos in Virtual Reality
* Subjective Quality Assessment of Compressed Tone-Mapped High Dynamic Range Videos
* Subspace-Constrained Tyler's Estimator and its Applications to Structure from Motion *, A
* SubT-MRS Dataset: Pushing SLAM Towards All-weather Environments
* SUBTLE: An Unsupervised Platform with Temporal Link Embedding that Maps Animal Behavior
* SUGAR: Pre-training 3D Visual Representations for Robotics
* SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering
* Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction Anticipation
* Summer Chukchi Sea Near-Surface Salinity Variability in Satellite Observations and Ocean Models
* SUNDIAL: 3D Satellite Understanding through Direct, Ambient, and Complex Lighting Decomposition
* Super-resolution of biomedical volumes with 2D supervision
* Super-Resolution Reconstruction from Bayer-Pattern Spike Streams
* SuperLoRA: Parameter-Efficient Unified Adaptation for Large Vision Models
* SuperNormal: Neural Surface Reconstruction via Multi-View Normal Integration
* Superpixel-Guided Multi-Type Rail Segmentation via Contextual Information Aggregation
* SuperPrimitive: Scene Reconstruction at a Primitive Level
* SuperSVG: Superpixel-Based Scalable Vector Graphics Synthesis
* Supervised Anomaly Detection for Complex Industrial Images
* Supervised Contrastive Learning for Snapshot Spectral Imaging Face Anti-Spoofing
* Supervised Learning-Based Prediction of Lightning Probability in the Warm Season
* Supporting Human-Robot Interaction by Projected Augmented Reality and a Brain Interface
* Suppress and Rebalance: Towards Generalized Multi-Modal Face Anti-Spoofing
* SURE: SUrvey REcipes for Building Reliable and Robust Deep Networks
* Surface Reconstruction from SLAM-Based Point Clouds: Results from the Datasets of the 2023 SIFET Benchmark
* SurMo: Surface-based 4D Motion Modeling for Dynamic Human Rendering
* SurroundSDF: Implicit 3D Scene Understanding Based on Signed Distance Field
* Survey of Video Datasets for Grounded Event Understanding, A
* Survey on 3D Egocentric Human Pose Estimation, A
* SVCE: Shapley Value Guided Counterfactual Explanation for Machine Learning-Based Autonomous Driving
* SVDinsTN: A Tensor Network Paradigm for Efficient Structure Search from Regularized Modeling Perspective
* SVDTree: Semantic Voxel Diffusion for Single Image Tree Reconstruction
* SVGDreamer: Text Guided SVG Generation with Diffusion Model
* Swarm Investigation of Ultra-Low-Frequency (ULF) Pulsation and Plasma Irregularity Signatures Potentially Associated with Geophysical Activity
* SWFormer: Stochastic Windows Convolutional Transformer for Hybrid Modality Hyperspectral Classification
* Swift Parameter-free Attention Network for Efficient Super-Resolution
* SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation
* Swin-chart: An efficient approach for chart classification
* SwinFuSR: an image fusion-inspired model for RGB-guided thermal image super-resolution
* SwitchLight: Co-Design of Physics-Driven Architecture and Pre-training Framework for Human Portrait Relighting
* Symbolization of Regional Elements Based on Local-Chronicle Text Mining and Image-Feature Extraction, The
* Symmetric Multi-View Subspace Clustering With Automatic Neighbor Discovery
* Symphonize 3D Semantic Scene Completion with Contextual Instance Queries
* SyncMask: Synchronized Attentional Masking for Fashion-centric Vision-Language Pretraining
* SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis
* Synergistic Global-Space Camera and Human Reconstruction from Videos
* Synergistic Potential of Optical and Radar Remote Sensing for Snow Cover Monitoring
* SynFog: A Photorealistic Synthetic Fog Dataset Based on End-to-End Imaging Simulation for Advancing Real-World Defogging in Autonomous Driving
* SynSP: Synergy of Smoothness and Precision in Pose Sequences Refinement
* Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA
* Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding
* Synthetic Data for Video Surveillance Applications of Computer Vision: A Review
* SyntStereo2Real: Edge-Aware GAN for Remote Sensing Image-to-Image Translation while Maintaining Stereo Constraint
* Systematic Approach Toward Robust Perception in (Semi-)Autonomous Vehicles, A
* Systematic comparison of semi-supervised and self-supervised learning for medical image classification
* T-DEED: Temporal-Discriminability Enhancer Encoder-Decoder for Precise Event Spotting in Sports Videos
* T-VSL: Text-Guided Visual Sound Source Localization in Mixtures
* T2FNorm: Train-time Feature Normalization for OOD Detection in Image Classification
* T2LM: Long-Term 3D Human Motion Generation from Multiple Sentences
* T2VBench: Benchmarking Temporal Dynamics for Text-to-Video Generation
* T4P: Test-Time Training of Trajectory Prediction via Masked Autoencoder and Actor-Specific Token Memory
* TAB: Text-Align Anomaly Backbone Model for Industrial Inspection Tasks
* Table tennis ball spin estimation with an event camera
* Tackling Domain Shifts in Person Re-Identification: A Survey and Analysis
* Tackling the Non-IID Issue in Heterogeneous Federated Learning by Gradient Harmonization
* Tackling the Satellite Downlink Bottleneck with Federated Onboard Learning of Image Compression
* Tackling the Singularities at the Endpoints of Time Intervals in Diffusion Models
* TACO: Benchmarking Generalizable Bimanual Tool-ACtion-Object Understanding
* Tactile-Augmented Radiance Fields
* Tailored Visions: Enhancing Text-to-Image Generation with Personalized Prompt Rewriting
* TAKD: Target-Aware Knowledge Distillation for Remote Sensing Scene Classification
* TAME: Task Agnostic Continual Learning using Multiple Experts
* Taming Mode Collapse in Score Distillation for Text-to-3D Generation
* Taming Self-Training for Open-Vocabulary Object Detection
* Taming Stable Diffusion for Text to 360° Panorama Image Generation
* Taming the Tail in Class-Conditional GANs: Knowledge Sharing via Unconditional Training at Lower Resolutions
* TAMM: TriAdapter Multi-Modal Learning for 3D Shape Understanding
* Target Before Shooting: Accurate Anomaly Detection and Localization Under One Millisecond via Cascade Patch Retrieval
* Target Detection Algorithm Based on Fusing Radar with a Camera in the Presence of a Fluctuating Signal Intensity, A
* Targeted Representation Alignment for Open-World Semi-Supervised Learning
* TASeg: Temporal Aggregation Network for LiDAR Semantic Segmentation
* Task Navigator: Decomposing Complex Tasks for Multimodal Large Language Models
* Task-Adaptive Saliency Guidance for Exemplar-Free Class Incremental Learning
* Task-Aligned Part-Aware Panoptic Segmentation Through Joint Object-Part Representations
* Task-Aware Encoder Control for Deep Video Compression
* Task-Conditioned Adaptation of Visual Features in Multi-Task Policy Learning
* Task-Customized Mixture of Adapters for General Image Fusion
* Task-Driven Exploration: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection
* Task-Driven Wavelets Using Constrained Empirical Risk Minimization
* Task2Box: Box Embeddings for Modeling Asymmetric Task Relationships
* TattTRN: Template Reconstruction Network for Tattoo Retrieval
* TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals
* TCP: Textual-Based Class-Aware Prompt Tuning for Visual-Language Model
* TE-LSTM: A Prediction Model for Temperature Based on Multivariate Time Series Data
* TE-TAD: Towards Full End-to-End Temporal Action Detection via Time-Aligned Coordinate Expression
* TEA: Test-Time Energy Adaptation
* TeamTrack: A Dataset for Multi-Sport Multi-Object Tracking in Full-pitch Videos
* Technical Report of NICE Challenge at CVPR 2024: Caption Re-ranking Evaluation Using Ensembled CLIP and Consensus Scores
* Teeth-SEG: An Efficient Instance Segmentation Framework for Orthodontic Treatment Based on Multi-Scale Aggregation and Anthropic Prior Knowledge
* Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence
* TeMO: Towards Text-Driven 3D Stylization for Multi-Object Meshes
* Template Free Reconstruction of Human-object Interaction with Procedural Interaction Generation
* Temporal surface frame anomalies for deepfake video detection
* Temporally Consistent Enhancement of Low-Light Videos via Spatial-Temporal Compatible Learning
* Temporally Consistent Unbalanced Optimal Transport for Unsupervised Action Segmentation
* Tensor Based Unified Framework for Streaming Signal Online Analysis, A
* Tensor Low-Rank Graph Embedding and Learning for One-Step Incomplete Multi-View Clustering
* Test Time Training for Industrial Anomaly Segmentation
* Test- Time Adaptation for Depth Completion
* Test-Time Adaptation with SaLIP: A Cascade of SAM and CLIP for Zero-shot Medical Image Segmentation
* Test-time Assessment of a Model's Performance on Unseen Domains via Optimal Transport
* Test-Time Domain Generalization for Face Anti-Spoofing
* Test-Time Linear Out-of-Distribution Detection
* Test-time Specialization of Dynamic Neural Networks
* Test-Time Zero-Shot Temporal Action Localization
* Testing the Impact of Pansharpening Using PRISMA Hyperspectral Data: A Case Study Classifying Urban Trees in Naples, Italy
* TetraSphere: A Neural Descriptor for O(3)-Invariant Point Cloud Analysis
* TeTriRF: Temporal Tri-Plane Radiance Fields for Efficient Free-Viewpoint Video
* TexOct: Generating Textures of 3D Models with Octree-based Diffusion
* Text Grouping Adapter: Adapting Pre-Trained Text Detector for Layout Analysis
* Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval
* Text Prompt with Normality Guidance for Weakly Supervised Video Anomaly Detection
* Text-Conditional Attribute Alignment Across Latent Spaces for 3D Controllable Face Image Synthesis
* Text-Conditioned Generative Model of 3D Strand-Based Human Hairstyles
* Text-Driven Image Editing via Learnable Regions
* Text-Driven Traffic Anomaly Detection with Temporal High-Frequency Modeling in Driving Videos
* Text-Enhanced Data-Free Approach for Federated Class-Incremental Learning
* Text-Guided 3D Face Synthesis: From Generation to Editing
* Text-Guided Explorable Image Super-Resolution
* Text-Guided Multi-Class Multi-Object Tracking for Fine-Grained Maritime Rescue
* Text-Guided Prototype Generation for Occluded Person Re-Identification
* Text-Guided Variational Image Generation for Industrial Anomaly Detection and Segmentation
* Text-IF: Leveraging Semantic Text Guidance for Degradation-Aware and Interactive Image Fusion
* Text-Image Alignment for Diffusion-Based Perception
* Text-to-3D Generation with Bidirectional Diffusion Using Both 2D and 3D Priors
* Text-to-3D using Gaussian Splatting
* Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers
* Text2HOI: Text-Guided 3D Motion Generation for Hand-Object Interaction
* Text2Loc: 3D Point Cloud Localization from Natural Language
* Text2QR: Harmonizing Aesthetic Customization and Scanning Robustness for Text-Guided QR Code Generation
* TextAdapter: Self-Supervised Domain Adaptation for Cross-Domain Text Recognition
* TextCraftor: Your Text Encoder can be Image Quality Controller
* TexTile: A Differentiable Metric for Texture Tileability
* TextNeRF: A Novel Scene-Text Image Synthesis Method Based on Neural Radiance Fields
* Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On
* TextureDreamer: Image-Guided Texture Synthesis through Geometry-Aware Diffusion
* TexVocab: Texture Vocabulary-Conditioned Human Avatars
* TF-REF-RNN: Time-Frequency and Reference Signal Feature Fusion Recurrent Neural Network for Underwater Backscatter Signal Separation
* TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models
* TFNet: Exploiting Temporal Cues for Fast and Accurate LiDAR Semantic Segmentation
* TG-Pose: Delving Into Topology and Geometry for Category-Level Object Pose Estimation
* Theoretically Achieving Continuous Representation of Oriented Bounding Boxes
* Theory of Joint Light and Heat Transport for Lambertian Scenes, A
* Thermal Image Super-Resolution Challenge Results: PBVS 2024
* Thin Cloud Removal Generative Adversarial Network Based on Sparse Transformer in Remote Sensing Images
* Think Twice Before Selection: Federated Evidential Active Learning for Medical Image Analysis with Domain Shifts
* Third Monocular Depth Estimation Challenge, The
* Thorough Understanding and 3D Super-Resolution Imaging for Forward-Looking Missile-Borne SAR via a Maneuvering Trajectory
* Three Pillars Improving Vision Foundation Model Distillation for Lidar
* Three-Dimensional Reconstruction of Partially Coherent Scatterers Using Iterative Sub-Network Generation Method
* Three-Dimensional Reconstruction of Zebra Crossings in Vehicle-Mounted LiDAR Point Clouds
* THRONE: An Object-Based Hallucination Benchmark for the Free-Form Generations of Large Vision-Language Models
* TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models
* Tidal Flat Extraction and Analysis in China Based on Multi-Source Remote Sensing Image Collection and MSIC-OA Algorithm
* TIDE: Test-Time Few-Shot Object Detection
* TIG: A Multitask Temporal Interval Guided Framework for Key Frame Detection
* TIGER: Time-Varying Denoising Model for 3D Point Cloud Generation with Diffusion Process
* TIM: A Time Interval Machine for Audio-Visual Action Recognition
* Time-, Memory- and Parameter-Efficient Visual Adaptation
* Time-Efficient Light-Field Acquisition Using Coded Aperture and Events
* Time-Frequency Analysis of Variable-Length WiFi CSI Signals for Person Re-Identification
* TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
* TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image Editing
* TirSA: A Three Stage Approach for UAV-Satellite Cross-View Geo-Localization Based on Self-Supervised Feature Enhancement
* TLS-MWP: A Tensor-Based Long- and Short-Range Convolution for Multiple Weather Prediction
* TlTScore: Towards Long-Tail Effects in Text-to-Visual Evaluation with Generative Foundation Models
* TMP: Temporal Motion Propagation for Online Video Super-Resolution
* To Boost Zero-Shot Generalization for Embodied Reasoning With Vision-Language Pre-Training
* To Err is Automation: Can Trust be Repaired by the Automated Driving System After its Failure?
* Token Transformation Matters: Towards Faithful Post-Hoc Explanation for Vision Transformer
* TokenCompose: Text-to-Image Diffusion with Token-Level Supervision
* TokenHMR: Advancing Human Mesh Recovery with a Tokenized Pose Representation
* ToNNO: Tomographic Reconstruction of a Neural Network's Output for Weakly Supervised Segmentation of 3D Medical Images
* ToonerGAN: Reinforcing GANs for Obfuscating Automated Facial Indexing
* Total Selfie: Generating Full-Body Selfies
* Total-Decom: Decomposed 3D Scene Reconstruction with Minimal Interaction
* Toward C-V2X Enabled Connected Transportation System: RSU-Based Cooperative Localization Framework for Autonomous Vehicles
* Toward Efficient and Secure Object Detection With Sparse Federated Training Over Internet of Vehicles
* Toward Explainable End-to-End Driving Models via Simplified Objectification Constraints
* Toward Foundation Models for Inclusive Object Detection: Geometry- and Category-Aware Feature Extraction Across Road User Categories
* Toward Generalist Anomaly Detection via In-Context Residual Learning with Few-Shot Sample Prompts
* Toward High-Accuracy and Real-Time Two-Stage Small Object Detection on FPGA
* Toward Interactive Image Inpainting via Robust Sketch Refinement
* Toward Motion Robustness: A masked attention regularization framework in remote photoplethysmography
* Toward Real-World Blind Face Restoration With Generative Diffusion Prior
* Toward Smart Skies: Reviewing the State of the Art and Challenges for Intelligent Air Transportation Systems (IATS)
* Towards 3D Vision with Low-Cost Single-Photon Cameras
* Towards a Perceptual Evaluation Framework for Lighting Estimation
* Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation
* Towards Accurate and Robust Architectures via Neural Architecture Search
* Towards Accurate Post-Training Quantization for Diffusion Models
* Towards Automated Movie Trailer Generation
* Towards Automatic Power Battery Detection: New Challenge, Benchmark Dataset and Baseline
* Towards Backward-Compatible Continual Learning of Image Compression
* Towards Better Vision-Inspired Vision-Language Models
* Towards Calibrated Multi-Label Deep Neural Networks
* Towards CLIP-Driven Language-Free 3D Visual Grounding via 2D-3D Relational Enhancement and Consistency
* Towards Co-Evaluation of Cameras, HDR, and Algorithms for Industrial-Grade 6DoF Pose Estimation
* Towards Diverse Binary Segmentation via a Simple yet General Gated Network
* Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation
* Towards Efficient Audio-Visual Learners via Empowering Pre-trained Vision Transformers with Cross-Modal Adaptation
* Towards Efficient Machine Unlearning with Data Augmentation: Guided Loss-Increasing (GLI) to Prevent the Catastrophic Model Utility Drop
* Towards Efficient Replay in Federated Incremental Learning
* Towards Engineered Safe AI with Modular Concept Models
* Towards Explainable Visual Vessel Recognition Using Fine-Grained Classification and Image Retrieval
* Towards Fairness-Aware Adversarial Learning
* Towards General Robustness Verification of MaxPool-Based Convolutional Neural Networks via Tightening Linear Approximation
* Towards Generalizable Multi-Object Tracking
* Towards Generalizable Tumor Synthesis
* Towards Generalizing to Unseen Domains with Few Labels
* Towards HDR and HFR Video from Rolling-Mixed-Bit Spikings
* Towards High-fidelity Artistic Image Vectorization via Texture-Encapsulated Shape Parameterization
* Towards High-Quality Photorealistic Image Style Transfer
* Towards Language-Driven Video Inpainting via Multimodal Large Language Models
* Towards Large-Scale 3D Representation Learning with Multi-Dataset Point Prompt Training
* Towards Learning a Generalist Model for Embodied Navigation
* Towards Learning Image Similarity from General Triplet Labels
* Towards Memorization-Free Diffusion Models
* Towards Modern Image Manipulation Localization: A Large-Scale Dataset and Novel Methods
* Towards More Accurate Diffusion Model Acceleration with a Timestep Tuner
* Towards More Unified In-Context Visual Understanding
* Towards Online Real-Time Memory-based Video Inpainting Transformers
* Towards Progressive Multi-Frequency Representation for Image Warping
* Towards Quantitative Evaluation Metrics for Image Editing Approaches
* Towards Real-World HDR Video Reconstruction: A Large-Scale Benchmark Dataset and A Two-Stage Alignment Network
* Towards Real-world Video Face Restoration: A New Benchmark
* Towards Realistic Scene Generation with LiDAR Diffusion Models
* Towards Robust 3D Object Detection with LiDAR and 4D Radar Fusion in Various Weather Conditions
* Towards Robust 3D Pose Transfer with Adversarial Learning
* Towards Robust Event-guided Low-Light Image Enhancement: A Large-Scale Real-World Event-Image Dataset and Novel Approach
* Towards Robust Learning to Optimize with Theoretical Guarantees
* Towards Robust Person Re-Identification by Adversarial Training With Dynamic Attack Strategy
* Towards Scalable 3D Anomaly Detection and Localization: A Benchmark via 3D Anomaly Synthesis and A Self-Supervised Learning Network
* Towards Surveillance Video-and-Language Understanding: New Dataset, Baselines, and Challenges
* Towards Text-guided 3D Scene Composition
* Towards the Uncharted: Density-Descending Feature Perturbation for Semi-supervised Semantic Segmentation
* Towards Transferable Targeted 3D Adversarial Attack in the Physical World
* Towards Understanding and Improving Adversarial Robustness of Vision Transformers
* Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing
* Towards Variable and Coordinated Holistic Co-Speech Motion Generation
* Towards Weakly-Supervised Domain Adaptation for Lane Detection
* Traceable Federated Continual Learning
* Tracking and Counting Apples in Orchards Under Intermittent Occlusions and Low Frame Rates
* Tracking Control for High-Speed Train With Coupler Constraints
* Tracking Phytoplankton Biomass Amid Wildfire Smoke Interference Using Landsat 8 OLI
* Tracking the Development of Lit Fisheries by Using DMSP/OLS Data in the Open South China Sea
* Tracklet-based Explainable Video Anomaly Localization
* Traffic Parameters Estimation With Partial Vehicle Trajectories by the Iterative Partial Backpropagation Maximum Likelihood Estimation (IPB-MLE) Framework
* Traffic Scene Parsing Through the TSP6K Dataset
* TrafficVLM: A Controllable Visual Language Model for Traffic Video Captioning
* Training Diffusion Models Towards Diverse Image Generation with Reinforcement Learning
* Training Dynamics of Nonlinear Contrastive Learning Model in the High Dimensional Limit
* Training Generative Image Super-Resolution Models by Wavelet-Domain Losses Enables Better Control of Artifacts
* Training Like a Medical Resident: Context-Prior Learning Toward Universal Medical Image Segmentation
* Training Transformer Models by Wavelet Losses Improves Quantitative and Visual Performance in Single Image Super-Resolution
* Training Vision Transformers for Semi-Supervised Semantic Segmentation
* Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation
* Training-Free Pretrained Model Merging
* TrajFine: Predicted Trajectory Refinement for Pedestrian Trajectory Forecasting
* Transcending Forgery Specificity with Latent Space Augmentation for Generalizable Deepfake Detection
* Transcending the Limit of Local Window: Advanced Super-Resolution Transformer with Adaptive Token Dictionary
* Transcriptomics-Guided Slide Representation Learning in Computational Pathology
* Transductive Transfer Learning-Assisted Hybrid Deep Learning Model for Accurate State of Charge Estimation of Li-Ion Batteries in Electric Vehicles
* Transductive Zero-Shot and Few-Shot CLIP
* Transfer CLIP for Generalizable Image Denoising
* Transferable and Principled Efficiency for Open-Vocabulary Segmentation
* Transferable Structural Sparse Adversarial Attack Via Exact Group Sparsity Training
* Transferring Annotator- and Instance-Dependent Transition Matrix for Learning From Crowds
* Transformer-Based Spatio-Temporal Unsupervised Traffic Anomaly Detection in Aerial Videos
* Transformer-Unet Generative Adversarial Network for the Super-Resolution Reconstruction of DEMs, A
* Transformers for Orbit Determination Anomaly Detection and Classification
* TransLoc4D: Transformer-Based 4D Radar Place Recognition
* Transmission Line Component Defect Detection Based on UAV Patrol Images: A Self-Supervised HC-ViT Method
* TransNeXt: Robust Foveal Visual Perception for Vision Transformers
* Transportation Mode Recognition Based on Low-Rate Acceleration and Location Signals With an Attention-Based Multiple-Instance Learning Network
* Travel Demand Forecasting: A Fair AI Approach
* TreeSeg: A Toolbox for Fully Automated Tree Crown Segmentation Based on High-Resolution Multispectral UAV Data
* Tri-Modal Motion Retrieval by Learning a Joint Embedding Space
* Tri-Perspective view Decomposition for Geometry-Aware Depth Completion
* Tri-VAE: Triplet Variational Autoencoder for Unsupervised Anomaly Detection in Brain Tumor MRI
* Triage of 3D pathology data via 2.5D multiple-instance learning to guide pathologist assessments
* TRINS: Towards Multimodal Language Models that Can Read
* TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models
* Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with Transformers
* Troika: Multi-Path Cross-Modal Traction for Compositional Zero-Shot Learning
* TTA-EVF: Test-Time Adaptation for Event-based Video Frame Interpolation via Reliable Pixel and Sample Estimation
* TULIP: Multi-Camera 3D Precision Assessment of Parkinson's Disease
* TULIP: Transformer for Upsampling of LiDAR Point Clouds
* Tumor Micro-Environment Interactions Guided Graph Learning for Survival Analysis of Human Cancers from Whole-Slide Pathological Images
* TUMTraf V2X Cooperative Perception Dataset
* Tune-an-Ellipse: CLIP Has Potential to Find what you Want
* Tuning Stable Rank Shrinkage: Aiming at the Overlooked Structural Risk in Fine-tuning
* Turb-Seg-Res: A Segment-then-Restore Pipeline for Dynamic Videos with Atmospheric Turbulence
* TurboSL: Dense, Accurate and Fast 3D by Neural Inverse Structured Light
* TutteNet: Injective 3D Deformations by Composition of 2D Mesh Deformations
* Two Stage Dehazing Framework for Dense and Non-Homogeneous Dehazing
* Two-Dimensional Direction Finding for L-Shaped Coprime Array via Minimization of the Ratio of the Nuclear Norm and the Frobenius Norm
* Two-Dimensional Legendre Polynomial Method for Internal Tide Signal Extraction
* Two-Layer MPC Architecture for Efficient Mixed-Integer-Informed Obstacle Avoidance in Real-Time
* Two-Person Interaction Augmentation with Skeleton Priors
* Two-Stage Personalized Virtual Try-On Framework With Shape Control and Texture Guidance, A
* Two-Stream Temporal Feature Aggregation Based on Clustering for Few-Shot Action Recognition
* Tyche: Stochastic in-Context Learning for Medical Image Segmentation
* U-VAP: User-specified Visual Appearance Personalization via Decoupled Self Augmentation
* UAV Quantitative Remote Sensing of Riparian Zone Vegetation for River and Lake Health Assessment: A Review
* UAV Swarm Target Identification and Quantification Based on Radar Signal Independency Characterization
* UAV-Based Phenotyping: A Non-Destructive Approach to Studying Wheat Growth Patterns for Crop Improvement and Breeding Programs
* UAV-Rain1k: A Benchmark for Raindrop Removal from UAV Aerial Imagery
* UDAC: Under-Display Array Cameras
* UDiFF: Generating Conditional Unsigned Distance Fields with Optimal Wavelet Diffusion
* UFC-Net: Unrolling Fixed-point Continuous Network for Deep Compressive Sensing
* UFineBench: Towards Text-based Person Retrieval with Ultra-fine Granularity
* UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
* UFORecon: Generalizable Sparse-View Surface Reconstruction from Arbitrary and Unfavorable Sets
* ULIP-2: Towards Scalable Multimodal Pre-Training for 3D Understanding
* UltraAugment: Fan-shape and Artifact-based Data Augmentation for 2D Ultrasound Images
* UltrAvatar: A Realistic Animatable 3D Avatar Diffusion Model with Authenticity Guided Textures
* Unbiased Estimator for Distorted Conics in Camera Calibration
* Unbiased Faster R-CNN for Single-source Domain Generalized Object Detection
* Uncertainty Estimation for Tumor Prediction with Unlabeled Data
* Uncertainty Visualization via Low-Dimensional Posterior Projections
* Uncertainty-aware Action Decoupling Transformer for Action Anticipation
* Uncertainty-Aware Active Domain Adaptive Salient Object Detection
* Uncertainty-Aware Source-Free Adaptive Image Super-Resolution with Wavelet Augmentation Transformer
* Uncertainty-based Forgetting Mitigation for Generalized Few-Shot Object Detection
* Uncertainty-Guided Never-Ending Learning to Drive
* Uncovering Hidden Emotions with Adaptive Multi-Attention Graph Networks
* Uncovering the Hidden Cost of Model Compression
* Uncovering what, why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly
* Understanding and Improving Source-Free Domain Adaptation from a Theoretical Perspective
* Understanding Correlated Information Diffusion: From a Graphical Evolutionary Game Perspective
* Understanding ReLU Network Robustness Through Test Set Certification Performance
* Understanding the (Extra-)Ordinary: Validating Deep Model Decisions with Prototypical Concept-based Explanations
* Understanding Video Transformers via Universal Concept Discovery
* Unexplored Faces of Robustness and Out-of-Distribution: Covariate Shifts in Environment and Sensor Domains
* Ungeneralizable Examples
* UniBind: LLM-Augmented Unified and Balanced Representation Space to Bind Them All
* UniDCP: Unifying Multiple Medical Vision-Language Tasks via Dynamic Cross-Modal Learnable Prompts
* UniDepth: Universal Monocular Metric Depth Estimation
* Unified and Interpretable Emotion Representation and Expression Generation, A
* Unified and Real-Time Image Geo-Localization via Fine-Grained Overlap Estimation
* Unified Approach for Text-and Image-Guided 4D Scene Generation, A
* Unified Diffusion Framework for Scene-aware Human Motion Estimation from Sparse Signals, A
* Unified Entropy Optimization for Open-Set Test-Time Adaptation
* Unified Face Attack Detection with Micro Disturbance and a Two-Stage Training Strategy
* Unified Framework for Human-centric Point Cloud Video Understanding, A
* Unified Framework for Microscopy Defocus Deblur with Multi-Pyramid Transformer and Contrastive Learning, A
* Unified Language-Driven Zero-Shot Domain Adaptation
* Unified Physical-Digital Attack Detection Challenge
* Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
* Uniform Color Space with Advanced Hue Linearity: Pcs23-ucs
* Unifying Automatic and Interactive Matting with Pretrained ViTs
* Unifying Building Instance Extraction and Recognition in UAV Images
* Unifying Correspondence, Pose and NeRF for Generalized Pose-Free Novel View Synthesis
* Unifying Top-Down and Bottom-Up Scanpath Prediction Using Transformers
* UniGarmentManip: A Unified Framework for Category-Level Garment Manipulation via Dense Visual Correspondence
* UniGS: Unified Representation for Image Generation and Segmentation
* UniHuman: A Unified Model For Editing Human Images in the Wild
* UniMix: Towards Domain Adaptive and Generalizable LiDAR Semantic Segmentation in Adverse Weather
* Unimodal Multi-Task Fusion for Emotional Mimicry Intensity Prediction
* UniMODE: Unified Monocular 3D Object Detection
* UnionFormer: Unified-Learning Transformer with Multi-View Representation for Image Manipulation Detection and Localization
* UniPAD: A Universal Pre-Training Paradigm for Autonomous Driving
* UniParser: Multi-Human Parsing With Unified Correlation Representation Learning
* UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory
* UniPTS: A Unified Framework for Proficient Post-Training Sparsity
* UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
* Universal Dehazing via Haze Style Transfer
* Universal Novelty Detection Through Adaptive Contrastive Learning
* Universal Protocol to Benchmark Camera Calibration for Sports, A
* Universal Robustness via Median Randomized Smoothing for Real-World Super-Resolution
* Universal Segmentation at Arbitrary Granularity with Language Instruction
* Universal Semi-Supervised Domain Adaptation by Mitigating Common-Class Bias
* Universal Snow Avalanche Modeling Index Based on SAFI-Flow-R Approach in Poorly-Gauged Regions
* UniVS: Unified and Universal Video Segmentation with Prompts as Queries
* Unknown Prompt, the only Lacuna: Unveiling CLIP's Potential for Open Domain Generalization
* Unknown Sample Discovery for Source Free Open Set Domain Adaptation
* Unleashing Channel Potential: Space-Frequency Selection Convolution for SAR Object Detection
* Unleashing Network Potentials for Semantic Scene Completion
* Unleashing the Potential of SAM for Medical Adaptation via Hierarchical Decoding
* Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization
* Unlocking Pre-Trained Image Backbones for Semantic Image Synthesis
* Unlocking the Potential of Pre-Trained Vision Transformers for Few-Shot Semantic Segmentation through Relationship Descriptors
* Unlocking the Potential of Prompt-Tuning in Bridging Generalized and Personalized Federated Learning
* Unmixing Before Fusion: A Generalized Paradigm for Multi-Source-Based Hyperspectral Image Synthesis
* Unmixing Diffusion for Self-Supervised Hyperspectral Image Denoising
* UnO: Unsupervised Occupancy Fields for Perception and Forecasting
* Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation
* Unravelling Robustness of Deep Face Recognition Networks Against Illicit Drug Abuse Images
* Unreasonable Effectiveness of Pre-Trained Features for Camera Pose Refinement, The
* UnSAMFlow: Unsupervised Optical Flow Guided by Segment Anything Model
* UnScene3D: Unsupervised 3D Instance Segmentation for Indoor Scenes
* Unsegment Anything by Simulating Deformation
* Unsigned Orthogonal Distance Fields: An Accurate Neural Implicit Representation for Diverse 3D Shapes
* Unsupervised 3D Structure Inference from Category-Specific Image Collections
* Unsupervised Adaptation Learning for Real Multiplatform Hyperspectral Image Denoising
* Unsupervised Blind Image Deblurring Based on Self-Enhancement
* Unsupervised Deep Unrolling Networks for Phase Unwrapping
* Unsupervised Domain Adaptation Architecture Search with Self-Training for Land Cover Mapping
* Unsupervised Domain Adaptation for Multi-Stain Cell Detection in Breast Cancer with Transformers
* Unsupervised Domain Adaptation for Weed Segmentation Using Greedy Pseudo-labelling
* Unsupervised Domain Adaption Harnessing Vision-Language Pre-Training
* Unsupervised Feature Learning with Emergent Data-Driven Prototypicality
* Unsupervised Gaze Representation Learning from Multi-view Face Images
* Unsupervised Generative Fake Image Detector
* Unsupervised Image Prior via Prompt Learning and CLIP Semantic Guidance for Low-Light Image Enhancement
* Unsupervised Keypoints from Pretrained Diffusion Models
* Unsupervised Learning of Category-Level 3D Pose from Object-Centric Videos
* Unsupervised Low-Light Image Enhancement via Feature Smoothing and Curve Regression
* Unsupervised Microscopy Video Denoising
* Unsupervised Multi-Person 3D Human Pose Estimation From 2D Poses Alone
* Unsupervised Occupancy Learning from Sparse Point Cloud
* Unsupervised Reinforcement Learning for Multi-Task Autonomous Driving: Expanding Skills and Cultivating Curiosity
* Unsupervised Salient Instance Detection
* Unsupervised Semantic Segmentation Through Depth-Guided Feature Correlation and Sampling
* Unsupervised Template-assisted Point Cloud Shape Correspondence Network
* Unsupervised Universal Image Segmentation
* unsupervised video anomaly detection method via Optical Flow decomposition and Spatio-Temporal feature learning, An
* Unsupervised Video Domain Adaptation with Masked Pre-Training and Collaborative Self-Training
* Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation
* Unveiling the Ambiguity in Neural Inverse Rendering: A Parameter Compensation Analysis
* Unveiling the Anomalies in an Ever-Changing World: A Benchmark for Pixel-Level Anomaly Detection in Continual Learning
* Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions Through Masked Modeling
* Unveiling the Unknown: Unleashing the Power of Unknown to Known in Open-Set Source-Free Domain Adaptation
* UP-CrackNet: Unsupervised Pixel-Wise Road Crack Detection via Adversarial Image Restoration
* UP-NAS: Unified Proxy for Neural Architecture Search
* Upload-Efficient Scheme for Transferring Knowledge From a Server-Side Pre-trained Generator to Clients in Heterogeneous Federated Learning, An
* Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution
* Urban Internal Network Structure and Resilience Characteristics from the Perspective of Population Mobility: A Case Study of Nanjing, China
* Urban Perception Evaluation and Street Refinement Governance Supported by Street View Visual Elements Analysis
* Urban-Rural Exposure to Flood Hazard and Social Vulnerability in the Conterminous United States
* UrbanSARFloods: Sentinel-1 SLC-Based Benchmark Dataset for Urban and Open-Area Flood Mapping
* URHand: Universal Relightable Hands
* Use of SLSTR Sea Surface Temperature Data in OSTIA as a Reference Sensor: Implementation and Validation
* USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation
* Using Counterfactual Information for Breast Classification Diagnosis
* Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model
* Using Language-Aligned Gesture Embeddings for Understanding Gestures Accompanying Math Terms
* Utility-Fairness Trade-Offs and how to Find Them
* Utilizing Machine Learning and Multi-Station Observations to Investigate the Visibility of Sea Fog in the Beibu Gulf
* uTRAND: Unsupervised Anomaly Detection in Traffic Trajectories
* UV-IDM: Identity-Conditioned Latent Diffusion Model for Face UV-Texture Generation
* UVaT: Uncertainty Incorporated View-Aware Transformer for Robust Multi-View Classification
* UVEB: A Large-scale Benchmark and Baseline Towards Real-World Underwater Video Enhancement
* UVIS: Unsupervised Video Instance Segmentation
* UWMamba: UnderWater Image Enhancement With State Space Model
* V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs
* V-VIPE: Variational View Invariant Pose Embedding
* VA3: Virtually Assured Amplification Attack on Probabilistic Copyright Protection for Text-to-Image Generative Models
* VADiffusion: Compressed Domain Information Guided Conditional Diffusion for Video Anomaly Detection
* Validating Privacy-Preserving Face Recognition Under a Minimum Assumption
* Validation and Error Minimization of Global Ecosystem Dynamics Investigation (GEDI) Relative Height Metrics in the Amazon
* Vanishing-Point-Guided Video Semantic Segmentation of Driving Scenes
* VAREN: Very Accurate and Realistic Equine Network
* Variation in Glacier Albedo on the Tibetan Plateau between 2001 and 2022 Based on MODIS Data
* Variation in Vegetation Composition and Structure across Mudflat Areas in the Yellow River Delta, China
* VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction
* VBench: Comprehensive Benchmark Suite for Video Generative Models
* VCoder: Versatile Vision Encoders for Multimodal Large Language Models
* VecFusion: Vector Font Generation with Diffusion
* Vector Graphics Generation via Mutually Impulsed Dual-Domain Diffusion
* Versatile Framework for Continual Test-Time Domain Adaptation: Balancing Discriminability and Generalizability, A
* Versatile Medical Image Segmentation Learned from Multi-Source Datasets via Model Self-Disambiguation
* Versatile Navigation Under Partial Observability via Value-Guided Diffusion Policy
* Vertical Distribution of Water Vapor During Haze Processes in Northeast China Based on Raman Lidar Measurements
* VGGSfM: Visual Geometry Grounded Deep Structure from Motion
* VicTR: Video-conditioned Text Representations for Activity Recognition
* vid-TLDR: Training Free Token merging for Light-Weight Video Transformer
* Video Anomaly Detection via Spatio-Temporal Pseudo-Anomaly Generation: A Unified Approach
* Video Based Computational Coding of Movement Anomalies in ASD Children
* Video Frame Interpolation via Direct Synthesis with the Event-based Reference
* Video Harmonization with Triplet Spatio-Temporal Variation Patterns
* Video Instance Shadow Detection Under the Sun and Sky
* Video Interaction Recognition using an Attention Augmented Relational Network and Skeleton Data
* Video Interpolation with Diffusion Models
* Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization Inversion for Zero-Shot Video Editing, A
* Video Prediction by Modeling Videos as Continuous Multi-Dimensional Processes
* Video Question Generation for Dynamic Changes
* Video ReCap: Recursive Captioning of Hour-Long Videos
* Video Recognition in Portrait Mode
* Video Representation Learning for Conversational Facial Expression Recognition Guided by Multiple View Reconstruction
* Video Super-Resolution Transformer with Masked Inter&Intra-Frame Attention
* Video Unsupervised Domain Adaptation with Deep Learning: A Comprehensive Survey
* Video-Based Human Pose Regression via Decoupled Space-Time Aggregation
* Video-P2P: Video Editing with Cross-Attention Control
* Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video
* VideoBooth: Diffusion-based Video Generation with Image Prompts
* VideoCon: Robust Video-Language Alignment via Contrast Captions
* VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
* VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation
* VideoGrounding-DINO: Towards Open-Vocabulary Spatio- Temporal Video Grounding
* VideoLLM-online: Online Video Large Language Model for Streaming Video
* VideoMAC: Video Masked Autoencoders Meet ConvNets
* VideoRF: Rendering Dynamic Radiance Fields as 2D Feature Video Streams
* VideoSAGE: Video Summarization with Graph Representation Learning
* VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
* VidLA: Video-Language Alignment at Scale
* VidToMe: Video Token Merging for Zero-Shot Video Editing
* View from Above: Orthogonal-View Aware Cross-View Localization
* View-Category Interactive Sharing Transformer for Incomplete Multi-View Multi-Label Learning
* View-decoupled Transformer for Person Re-identification under Aerial-ground Camera Network
* ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models
* ViewFusion: Towards Multi-View Consistency via Interpolated Denoising
* Viewpoint-Aware Visual Grounding in 3D Scenes
* ViLa-MIL: Dual-scale Vision-Language Multiple Instance Learning for Whole Slide Image Classification
* VILA: On Pre-training for Visual Language Models
* Vim4Path: Self-Supervised Vision Mamba for Histopathology Images
* VINECS: Video-based Neural Character Skinning
* ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
* Virtual Immunohistochemistry Staining for Histological Images Assisted by Weakly-supervised Learning
* Virtual Inertia-Based Control and Stability Analysis for Proton Exchange Membrane Fuel Cell Hybrid Electric Vehicles
* Virtually Enriched NYU Depth V2 Dataset for Monocular Depth Estimation: Do We Need Artificial Augmentation?
* Vision Check-up for Language Models, A
* Vision-and-Language Navigation via Causal Learning
* Vision-language models for decoding provider attention during neonatal resuscitation
* Vision-Language Pseudo-Labels for Single-Positive Multi-Label Learning
* Vision-Transformer-Based Convex Variational Network for Bridge Pavement Defect Segmentation, A
* Vista-llama: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens
* VisTA-SR: Improving the Accuracy and Resolution of Low-Cost Thermal Imaging Cameras for Agriculture
* Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models
* Visual Concept Connectome (VCC): Open World Concept Discovery and Their Interlayer Connections in Deep Models
* Visual Delta Generator with Large Multi-Modal Models for Semi-Supervised Composed Image Retrieval
* Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation
* Visual in-Context Prompting
* Visual Layout Composer: Image-Vector Dual Diffusion Model for Design Layout Generation
* Visual Object Tracking With Mutual Affinity Aligned to Human Intuition
* Visual Objectification in Films: Towards a New AI Task for Video Interpretation
* Visual Point Cloud Forecasting Enables Scalable Autonomous Driving
* Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models
* Visual Programming for Zero-Shot Open-Vocabulary 3D Visual Grounding
* Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach
* Visual Tuning
* Visual-Augmented Dynamic Semantic Prototype for Generative Zero-Shot Learning
* visualization method for data domain changes in CNN networks and the optimization method for selecting thresholds in classification tasks, A
* ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions
* VIT-LENS: Towards Omni-modal Representations
* ViTA: An Efficient Video-to-Text Algorithm using VLM for RAG-based Video Analysis System
* ViTamin: Designing Scalable Vision Models in the Vision-Language Era
* ViTKD: Feature-based Knowledge Distillation for Vision Transformers
* ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models
* VLG: General Video Recognition with Web Textual Knowledge
* VLM-PL: Advanced Pseudo Labeling approach for Class Incremental Object Detection via Vision-Language Model
* Vlogger: Make Your Dream A Vlog
* VLP: Vision Language Planning for Autonomous Driving
* VMC: Video Motion Customization Using Temporal Attention Adaption for Text-to-Video Diffusion Models
* VMCML: Video and Music Matching via Cross-Modality Lifting
* VMINer: Versatile Multi-view Inverse Rendering with Near-and Far-field Light Sources
* VMRNN: Integrating Vision Mamba and LSTM for Efficient and Accurate Spatiotemporal Forecasting
* VoCo: A Simple-Yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis
* VolRAFT: Volumetric Optical Flow Network for Digital Volume Correlation of Synchrotron Radiation-based Micro-CT Images of Bone-Implant Interfaces
* Volumetric Environment Representation for Vision-Language Navigation
* VOODOO 3D: Volumetric Portrait Disentanglement for One-Shot 3D Head Reenactment
* Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning
* VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation
* VRetouchEr: Learning Cross-Frame Feature Interdependence with Imperfection Flow for Face Retouching in Videos
* VRP-SAM: SAM with Visual Reference Prompt
* VS: Reconstructing Clothed 3D Human from Single Image via Vertex Shift
* VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning
* VSRD: Instance-Aware Volumetric Silhouette Rendering for Weakly Supervised 3D Object Detection
* VST++: Efficient and Stronger Visual Saliency Transformer
* VT-Former: An Exploratory Study on Vehicle Trajectory Prediction for Highway Surveillance through Graph Isomorphism and Transformer
* VTimeLLM: Empower LLM to Grasp Video Moments
* VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning
* V_kD: Improving Knowledge Distillation Using Orthogonal Projections
* Wake-Sleep Energy Based Models for Continual Learning
* Wall Clutter Suppression Method Based on Amplitude Coherence Factor for MIMO Through-the-Wall Radar
* WALT3D: Generating Realistic Training Data from Time-Lapse Imagery for Reconstructing Dynamic Objects Under Occlusion
* WANDR: Intention-guided Human Motion Generation
* WATCHER: Wavelet-Guided Texture-Content Hierarchical Relation Learning for Deepfake Detection
* WateRF: Robust Watermarks in Radiance Fields for Protection of Copyrights
* Watermark-embedded Adversarial Examples for Copyright Protection against Diffusion Models
* WaveFace: Authentic Face Restoration with Efficient Frequency Recovery
* Wavelet-based Fourier Information Interaction with Frequency Diffusion Adjustment for Underwater Image Restoration
* WaveMo: Learning Wavefront Modulations to See Through Scattering
* Weak-to-Strong 3D Object Detection with X-Ray Distillation
* Weakly Misalignment-Free Adaptive Feature Alignment for UAVs-Based Multimodal Object Detection
* Weakly Supervised End2End Deep Visual Odometry
* Weakly Supervised Monocular 3D Detection with a Single-View Image
* Weakly Supervised Point Cloud Semantic Segmentation via Artificial Oracle
* Weakly Supervised Set-Consistency Learning Improves Morphological Profiling of Single-Cell Images
* Weakly Supervised Video Individual Counting
* Weakly-Supervised Audio-Visual Video Parsing with Prototype-Based Pseudo-Labeling
* Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-Speech Gesture Generation
* Weakly-Supervised Temporal Action Localization with Multi-Modal Plateau Transformers
* Weighted Contrastive Prototype Network for Few-Shot Hyperspectral Image Classification with Noisy Labels
* Weighted Intersection over Union (wIoU) for evaluating image segmentation
* Westward Migration of the Chenghai-Jinsha Drainage Divide and Its Implication for the Initiation of the Chenghai Fault
* WHAM: Reconstructing World-Grounded Humans with Accurate 3D Motion
* What Do You See in Vehicle? Comprehensive Vision Solution for In-Vehicle Gaze Estimation
* What does CLIP know about peeling a banana?
* What If the TV was off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models
* What is Point Supervision Worth in Video Instance Segmentation?
* What Makes Deviant Places?
* What Makes Multimodal In-Context Learning Work?
* What Sketch Explainability Really Means for Downstream Tasks?
* What You See is What You GAN: Rendering Every Pixel for High-Fidelity Geometry in 3D GANs
* What, How, and When Should Object Detectors Update in Continually Changing Test Domains?
* What, When, and Where? Self-Supervised Spatio- Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
* When StyleGAN Meets Stable Diffusion: a W_+ Adapter for Personalized Image Generation
* When Visual Grounding Meets Gigapixel-Level Large-Scale Scenes: Benchmark and Approach
* Where Does the Devil Lie?: Multimodal Multitask Collaborative Revision Network for Trusted Road Segmentation
* Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos
* Wide-Area UAV Networks Cooperative Positioning Algorithm Based on Information Geometry
* Widely Linear Momentum LMS Algorithm for Second Order Noncircular Signals and Performance Analysis
* Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs
* WildlifeMapper: Aerial Image Analysis for Multi-Species Detection and Identification
* WinSyn: A High Resolution Testbed for Synthetic Data
* Wired Perspectives: Multi-View Wire Art Embraces Generative AI
* Wirtinger Calculus-Based Expectation Propagation in Latent Variable Models Applied to Grant-Free NOMA
* Wonder3D: Single Image to 3D Using Cross-Domain Diffusion
* WonderJourney: Going from Anywhere to Everywhere
* WorDepth: Variational Language Prior for Monocular Depth Estimation
* WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models
* Would Deep Generative Models Amplify Bias in Future Models?
* WWW: A Unified Framework for Explaining what, Where and why of Neural Networks by Interpretation of Neuron Concepts
* X- Adapter: Universal Compatibility of Plugins for Upgraded Diffusion Model
* X-3D: Explicit 3D Structure Modeling for Point Cloud Recognition
* X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization
* X-VARS: Introducing Explainability in Football Refereeing with Multi-Modal Large Language Models
* XCube: Large-Scale 3D Generative Modeling using Sparse Voxel Hierarchies
* XFeat: Accelerated Features for Lightweight Image Matching
* XFibrosis: Explicit Vessel-Fiber Modeling for Fibrosis Staging from Liver Pathology Images
* XoFTR: Cross-modal Feature Matching Transformer
* XScale- NVS: Cross-Scale Novel View Synthesis with Hash Featurized Manifold
* YOLC: You Only Look Clusters for Tiny Object Detection in Aerial Images
* YOLO-World: Real-Time Open-Vocabulary Object Detection
* YolOOD: Utilizing Object Detection Concepts for Multi-Label Out-of-Distribution Detection
* YOLOX-RDD: A Method of Anchor-Free Road Damage Detection for Front-View Images
* You Only Need Less Attention at Each Stage in Vision Transformers
* You'll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval
* Your Image Is My Video: Reshaping the Receptive Field via Image-to-Video Differentiable AutoAugmentation and Fusion
* Your Student is Better than Expected: Adaptive Teacher-Student Collaboration for Text-Conditional Diffusion Models
* Your Transferability Barrier is Fragile: Free-Lunch for Transferring the Non-Transferable Learning
* YOWOv3: A Lightweight Spatio-Temporal Joint Network for Video Action Detection
* Z*: Zero-shot Style Transfer via Attention Reweighting
* ZePT: Zero-Shot Pan-Tumor Segmentation via Query-Disentangling and Self-Prompting
* ZERO-IG: Zero-Shot Illumination-Guided Joint Denoising and Adaptive Enhancement for Low-Light Images
* Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis
* Zero-Reference Low-Light Enhancement via Physical Quadruple Priors
* Zero-Shot Audio-Visual Compound Expression Recognition Method based on Emotion Probability Fusion
* Zero-Shot Dual-Path Integration Framework for Open-Vocabulary 3D Instance Segmentation
* Zero-Shot Monocular Motion Segmentation in the Wild by Combining Deep Learning with Geometric Motion Model Fusion
* Zero-Shot Referring Expression Comprehension via Structural Similarity Between Images and Captions
* Zero-Shot Structure-Preserving Diffusion Model for High Dynamic Range Tone Mapping
* Zero-Shot Video Moment Retrieval With Angular Reconstructive Text Embeddings
* Zero-TPrune: Zero-Shot Token Pruning Through Leveraging of the Attention Graph in Pre-Trained Transformers
* ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Image
* ZeroRF: Fast Sparse View 360° Reconstruction with Zero Pretraining
* ZeroShape: Regression-Based Zero-Shot Shape Reconstruction
* ZInD-Tell: Towards Translating Indoor Panoramas into Descriptions
* ZONE: Zero-Shot Instruction-Guided Local Editing
4768 for 2410

Index for "2"


Last update:26-Nov-24 17:17:54
Use price@usc.edu for comments.