MMMod12 * *Advances in Multimedia Modeling
* AAU Video Browser: Non-Sequential Hierarchical Video Browsing without Content Analysis
* Active Cleaning for Video Corpus Annotation
* Annotated Free-Hand Sketches for Video Retrieval Using Object Semantics and Motion
* Asserting the Precise Position of 3D and Multispectral Acquisition Systems for Multisensor Registration Applied to Cultural Heritage Analysis
* Building Semantic Hierarchies Faithful to Image Semantics
* Challenges in Storing Multimedia Data for the Future: An Overview
* Client-Driven Price Selection for Scalable Video Streaming with Advertisements
* Clipboard: A Visual Search and Browsing Engine for Tablet and PC
* Colorization Using Quaternion Algebra with Automatic Scribble Generation
* Combining Image-Level and Segment-Level Models for Automatic Annotation
* Content Based Image Retrieval Using Bag-Of-Regions
* Content-Based Video Description for Automatic Video Genre Categorization
* Context-Aware Querying for Multimodal Search Engines
* Do-It-Yourself Eye Tracker: Low-Cost Pupil-Based Eye Tracker for Computer Graphics Applications
* Double Fusion for Multimedia Event Detection
* Effective Heterogeneous Similarity Measure with Nearest Neighbors for Cross-Media Retrieval
* Efficient Spatio-Temporal Edge Descriptor
* Efficient Storage and Decoding of SURF Feature Points
* EMIR: A Novel Music Retrieval System for Mobile Devices Incorporating Analysis of User Emotion
* Enhancing the User Experience with the Sensory Effect Media Player and AmbientLib
* FascinatE Production Scripting Engine, The
* Fast GPU-Based Motion Estimation Algorithm for H.264/AVC, A
* Fast Mode Decision Algorithm for H.264/AVC-to-SVC Transcoding with Temporal Scalability
* Film Comic Reflecting Camera-Works
* Finding Suits in Images of People
* Forward Wyner-Ziv Fast Video Decoding Using Multicore Processors
* Fusing Template and Point Information to Track Planes with Large Interframe Displacement
* Fuzzy Rank-Based Late Fusion Method for Image Retrieval, A
* Gait-Based Action Recognition via Accelerated Minimum Incremental Coding Length Classifier
* Hairstyle Suggestion Using Statistical Learning
* Hierarchical Navigation and Visual Search for Video Keyframe Retrieval
* How to Select and Customize Object Recognition Approaches for an Application?
* Identifying Objects in Images from Analyzing the Users' Gaze Movements for Provided Tags
* Improving Cluster Selection and Event Modeling in Unsupervised Mining for Automatic Audiovisual Video Structuring
* Improving Item Recommendation Based on Social Tag Ranking
* Investigating Gesture and Pressure Interaction with a 3D Display
* Key-Frame-Oriented Video Browser, A
* Large-Scale Similarity-Based Join Processing in Multimedia Databases
* Linking User Generated Video Annotations to the Web of Data
* Low Complexity Macroblock Layer Rate Control Scheme Base on Weighted-Window for H.264 Encoder, A
* Multi-layer Local Graph Words for Object Recognition
* Multi-modal Solution for Unconstrained News Story Retrieval
* Multimedia Retrieval Framework Based on Automatic Graded Relevance Judgments, A
* Multimodal Cue Detection Engine for Orchestrated Entertainment
* Multimodal Video Concept Detection via Bag of Auditory Words and Multiple Kernel Learning
* Novel Multi-modal Integration and Propagation Model for Cross-Media Information Retrieval, A
* On Stability of Adaptive Similarity Measures for Content-Based Image Retrieval
* On Video Recommendation over Social Network
* Optimizing Multimedia Retrieval Using Multimodal Fusion and Relevance Feedback Techniques
* OVIDIUS: A Web Platform for Video Browsing and Search
* Pedestrian Attribute Analysis Using a Top-View Camera in a Public Space
* Place Recognition via 3D Modeling for Personal Activity Lifelog Using Wearable Camera
* Real-Time Life Experience Logging Tool, A
* Real-Time Visualizations of Gigapixel Texture Data Sets Using HTML5
* Recurring Element Detection in Movies
* Retrieval of Multiple Instances of Objects in Videos
* RGB-D Based Multi-attribute People Search in Intelligent Visual Surveillance
* Scalable Mobile-to-Mobile Video Communications Based on an Improved WZ-to-SVC Transcoder
* Scene Signatures for Unconstrained News Video Stories
* Sensor-Based Analysis of User Generated Video for Multi-camera Video Remixing
* Sequence Kernels for Clustering and Visualizing Near Duplicate Video Segments
* Summarization and Presentation of Real-Life Events Using Community-Contributed Content
* Symbiotic Black-Box Tracker
* Template Matching and Monte Carlo Markova Chain for People Counting under Occlusions
* TempoM2: A Multi Feature Index Structure for Temporal Video Search
* Topic Based Query Suggestions for Video Search
* Towards Automatic Detection of CBIRs Configuration
* Towards Category-Based Aesthetic Models of Photographs
* Tracking Persons in Ultra-HD Panoramic Video
* U-Drumwave: An Interactive Performance System for Drumming
* Ultimate Immersive Experience: Panoramic 3D Video Acquisition, The
* Video Browser Showdown by NUS
* Video Browsing Tool for Content Management in Media Post-Production, A
* Video Browsing with a 3D Thumbnail Ring Arranged by Color Similarity
* Video Face Book, The
* Video Summarization Based on Balanced AV-MMR
* Visual Vocabulary Optimization with Spatial Context for Image Annotation and Classification
* Visual-Based Spatiotemporal Analysis for Nighttime Vehicle Braking Event Detection
* Workflow Activity Monitoring Using Dynamics of Pair-Wise Qualitative Spatial Relations
80 for MMMod12

MMMod14 * *Advances in Multimedia Modeling
* 3D Object Classification Using Deep Belief Networks
* Affect Recognition Using Magnitude Models of Motion
* Approximating the Signature Quadratic Form Distance Using Scalable Feature Signatures
* Audio-Visual Classification Video Browser
* Average Precision: Good Guide or False Friend to Multimedia Search Effectiveness?
* Browsing Linked Video Collections for Media Production
* Coherence Analysis of Metrics in LBP Space for Interactive Face Retrieval
* Collections for Automatic Image Annotation and Photo Tag Recommendation
* Comparative Study on the Use of Multi-label Classification Techniques for Concept-Based Video Indexing and Annotation, A
* Content-Based Video Browsing with Collaborating Mobile Clients
* Coverage Field Analysis to the Quality of Light Field Rendering
* Data-Driven Personalized Digital Ink for Chinese Characters, A
* Dense SURF and Triangulation Based Spatio-temporal Feature for Action Recognition, A
* Effects of Audio Compression on Chord Recognition
* Empirical Exploration of Extreme SVM-RBF Parameter Values for Visual Object Classification
* Eolas: Video Retrieval Application for Helping Tourists
* Evaluation of Local Action Descriptors for Human Action Classification in the Presence of Occlusion, An
* Event Detection by Velocity Pyramid
* Evolution of Research on Multimedia Travel Guide Search and Recommender Systems, The
* EvoTunes: Crowdsourcing-Based Music Recommendation
* Exploitation of Gaze Data for Photo Region Labeling in an Immersive Environment
* Exploring Distance-Aware Weighting Strategies for Accurate Reconstruction of Voxel-Based 3D Synthetic Models
* Factor Selection for Reinforcement Learning in HTTP Adaptive Streaming
* Folkioneer: Efficient Browsing of Community Geotagged Images on a Worldwide Scale
* FoodCam: A Real-Time Mobile Food Recognition System Employing Fisher Vector
* Framework of Video Coding for Compressing Near-Duplicate Videos, A
* Fusing Appearance and Spatio-temporal Features for Multiple Camera Tracking
* Gait Based Gender Recognition Using Sparse Spatio Temporal Features
* Graph-Based Multimodal Clustering for Social Event Detection in Large Collections of Images
* Hierarchical Audio-Visual Surveillance for Passenger Elevators
* How Do Users Search with Basic HTML5 Video Players?
* Human Action Recognition in Video via Fused Optical Flow and Moment Features: Towards a Hierarchical Approach to Complex Scenario Recognition
* Hybrid Machine-Crowd Approach to Photo Retrieval Result Diversification, A
* Improved Similarity-Based Fast Coding Unit Depth Decision Algorithm for Inter-frame Coding in HEVC, An
* Improving Scene Detection Algorithms Using New Similarity Measures
* Interactive Device for Exploring Thematically Sorted Artworks, An
* Investigation into Feature Effectiveness for Multimedia Hyperlinking, An
* Joint People Recognition across Photo Collections Using Sparse Markov Random Fields
* Kinect vs. Low-cost Inertial Sensing for Gesture Recognition
* Learning to Infer Public Emotions from Large-Scale Networked Voice Data
* LIRE Request Handler: A Solr Plug-In for Large Scale Content Based Image Retrieval, The
* Live Key Frame Extraction in User Generated Content Scenarios for Embedded Mobile Platforms
* Local Segmentation for Pedestrian Tracking in Dense Crowds
* Location-Aware Music Artist Recommendation
* Low-Complexity Rate-Distortion Optimization Algorithms for HEVC Intra Prediction
* Low-Cost Head and Eye Tracking System for Realistic Eye Movements in Virtual Avatars, A
* M3 + P3 + O3 = Multi-D Photo Browsing
* Mining the Web for Multimedia-Based Enriching
* MOSRO: Enabling Mobile Sensing for Real-Scene Objects with Grid Based Structured Output Learning
* MR Simulation for Re-wallpapering a Room in a Free-Hand Movie
* Muithu: A Touch-Based Annotation Interface for Activity Logging in the Norwegian Premier League
* Multi-view Action Synchronization in Complex Background
* Multiple Reference Frame Transcoding from H.264/AVC to HEVC
* New Saliency Model Using Intra Coded High Efficiency Video Coding (HEVC) Frames, A
* NII-UIT: A Tool for Known Item Search by Sequential Pattern Filtering
* Novel Approach for Semantics-Enabled Search of Multimedia Documents on the Web, A
* Novel Human Action Representation via Convolution of Shape-Motion Histograms, A
* Online Identification of Primary Social Groups
* Optimization Model for Aesthetic Two-Dimensional Barcodes, An
* Organising Crowd-Sourced Media Content via a Tangible Desktop Application
* Parameter-Free Inter-view Depth Propagation for Mobile Free-View Video
* Perceptual Characteristics of 3D Orientation, The
* Personal Media Reunion: Re-collecting Media Content Scattered over Smart Devices and Social Networks
* Personalized Recommendation by Exploring Social Users' Behaviors
* Perspective Multiscale Detection and Tracking of Persons
* Pursuing Detector Efficiency for Simple Scene Pedestrian Detection
* Random Matrix Ensembles of Time Correlation Matrices to Analyze Visual Lifelogs
* Real-Time Gaze Estimation Using a Kinect and a HD Webcam
* Real-Time Skeleton-Tracking-Based Human Action Recognition Using Kinect Data
* Real-World Event Detection Using Flickr Images
* Rebuilding Visual Vocabulary via Spatial-temporal Context Similarity for Video Retrieval
* RESIC: A Tool for Music Stretching Resistance Estimation
* Resource Constrained Multimedia Event Detection
* Robust Image Restoration via Reweighted Low-Rank Matrix Recovery
* Scenarizing CADastre Exquisse: A Crossover between Snoezeling in Hospitals/Domes, and Authoring/Experiencing Soundful Comic Strips
* Scenarizing Metropolitan Views: FlanoGraphing the Urban Spaces
* Segment and Label Indoor Scene Based on RGB-D for the Visually Impaired
* Semantic Based Background Music Recommendation for Home Videos
* Signature-Based Video Browser
* Smoke Detection Based on a Semi-supervised Clustering Model
* Sparse Patch Coding for 3D Model Retrieval
* Spatial Similarity Measure of Visual Phrases for Image Retrieval
* Spectral Classification of 3D Articulated Shapes
* Stixel on the Bus: An Efficient Lossless Compression Scheme for Depth Information in Traffic Scenarios
* Summarised Presentation of Personal Photo Sets
* Tag Relatedness Using Laplacian Score Feature Selection and Adapted Jensen-Shannon Divergence
* Task-Driven Image Retrieval Using Geographic Information
* Tell Me about TV Commercials of This Product
* Tools for User Interaction in Immersive Environments
* TravelBuddy: Interactive Travel Route Recommendation with a Visual Scene Interface
* Understanding Affective Content of Music Videos through Learned Representations
* User Intentions in Digital Photo Production: A Test Data Set
* VERGE: An Interactive Search Engine for Browsing Video Collections
* Video to Article Hyperlinking by Multiple Tag Property Exploration
* Visual Information Retrieval System for Radiology Reports and the Medical Literature, A
* Visual Recognition by Exploiting Latent Social Links in Image Collections
* Visual Saliency Weighting and Cross-Domain Manifold Ranking for Sketch-Based Image Retrieval
* Where Is the News Breaking? Towards a Location-Based Event Detection Framework for Journalists
* Who's the Best Charades Player? Mining Iconic Movement of Semantic Concepts
* Yoga Posture Recognition for Self-training
101 for MMMod14

MMMod15 * *Advances in Multimedia Modeling
* 3D Depth Perception from Single Monocular Images
* Aesthetic QR Codes Based on Two-Stage Image Blending
* Affective Music Recommendation System Based on the Mood of Input Video
* Analysis of Time Drift in Hand-Held Recording Devices, An
* AttRel: An Approach to Person Re-Identification by Exploiting Attribute Relationships
* Audio Secret Management Scheme Using Shamir's Secret Sharing
* Auditory Scene Classification with Deep Belief Network
* Automatic Chinese Personality Recognition Based on Prosodic Features
* Automatic Rib Segmentation Method on X-Ray Radiographs, An
* Azimuthal Perceptual Resolution Model Based Adaptive 3D Spatial Parameter Coding
* Binary Code Learning via Iterative Distance Adjustment
* Challenging Issues in Visual Information Understanding Researches
* Collaborative Browsing and Search in Video Archives with Mobile Clients
* Community Detection Based on Links and Node Features in Social Networks
* Computationally Efficient Algorithm for Large Scale Near-Duplicate Video Detection, A
* Concept-Based Multimodal Learning for Topic Generation
* Content-Based Discovery of Multiple Structures from Episodes of Recurrent TV Programs Based on Grammatical Inference
* Content-Based Image Retrieval with Gaussian Mixture Models
* Coupled Discriminant Multi-Manifold Analysis with Application to Low-Resolution Face Recognition
* Coupled-View Based Ranking Optimization for Person Re-identification
* Cross-Domain Concept Detection with Dictionary Coherence by Leveraging Web Images
* Cross-Modal Self-Taught Learning for Image Retrieval
* Discriminative Regions: A Substrate for Analyzing Life-Logging Image Sequences
* Dynamic Hierarchical Visualization of Keyframes in Endoscopic Video
* Dynamic User Authentication Based on Mouse Movements Curves
* Edge Direction-Based Fast Coding Unit Partition for HEVC Screen Content Coding
* Efficient Compression of Hyperspectral Images Using Optimal Compression Cube and Image Plane
* Efficient Hybrid Steganography Method Based on Edge Adaptive and Tree Based Parity Check, An
* Emotional Tone-Based Audio Continuous Emotion Recognition
* Enhanced Signature-Based Video Browser
* Facial Aging Simulator by Data-Driven Component-Based Texture Cloning
* Factorizing Time-Aware Multi-way Tensors for Enhancing Semantic Wearable Sensing
* Fast Human Activity Recognition in Lifelogging
* FISIR: A Flexible Framework for Interactive Search in Image Retrieval Systems
* Flat3D: Browsing Stereo Images on a Conventional Screen
* FOCUSING PATCH: Automatic Photorealistic Deblurring for Facial Images by Patch-Based Color Transfer
* Graph-Based Browsing for Large Video Collections
* Hessian Regularized Sparse Coding for Human Action Recognition
* Image Taken Place Estimation via Geometric Constrained Spatial Layer Matching
* ImageMap: Visually Browsing Millions of Images
* IMOTION: A Content-Based Video Retrieval Engine
* Improved Content-Based Music Recommending Method with Weighted Tags, An
* Improved Rate-Distortion Optimization Algorithms for HEVC Lossless Coding
* Improving Interactive Known-Item Search in Video with the Keyframe Navigation Tree
* Interactive Known-Item Search Using Semantic Textual and Colour Modalities
* Iron Maiden While Jogging, Debussy for Dinner?
* Is Your First Impression Reliable? Trustworthy Analysis Using Facial Traits in Portraits
* Large-Scale Image Mining with Flickr Groups
* Live Version Identification with Audio Scene Detection
* Making Lifelogging Usable: Design Guidelines for Activity Trackers
* MAP: Microblogging Assisted Profiling of TV Shows
* MemLog, an Enhanced Lifelog Annotation and Search Tool
* Mobile Image Analysis: Android vs. iOS
* Moving Object Tracking with Structure Complexity Coefficients
* Multi-Dimensional Data Model for Personal Photo Browsing, A
* Multi-instance Feature Learning Based on Sparse Representation for Facial Expression Recognition
* Multi-stripe Video Browser for Tablets, The
* Multiclass Boosting Framework for Multimodal Data Analysis
* Multidimensional Context Awareness in Mobile Devices
* Multimedia Social Event Detection in Microblog
* Multimodal Music Mood Classification by Fusion of Audio and Lyrics
* Muscular Movement Model Based Automatic 3D Facial Expression Recognition
* New Image Decomposition and Reconstruction Approach: Adaptive Fourier Decomposition, A
* NII-UIT Browser: A Multimodal Video Search System
* Non-negative Low-Rank and Group-Sparse Matrix Factorization
* Novel Error Concealment Algorithm for H.264/AVC, A
* Novel Fast Full Frame Video Stabilization via Three-Layer Model, A
* Novel Optimized Watermark Embedding Scheme for Digital Images, A
* Object Detection in Low-Resolution Image via Sparse Representation
* Online 3D Shape Segmentation by Blended Learning
* Orderless and Blurred Visual Tracking via Spatio-temporal Context
* Outdoor Air Quality Inference from Single Image
* Patch-Based Disparity Remapping for Stereoscopic Images
* Performance Evaluation of Students Using Multimodal Learning Systems
* Person Re-identification Using Data-Driven Metric Adaptation
* Personality Modeling Based Image Recommendation
* Photo Quality Assessment with DCNN that Understands Image Well
* Proxemic Multimedia Interaction over the Internet of Things, A
* Real-Time People Counting across Spatially Adjacent Non-overlapping Camera Views
* Real-Time People Counting Approach in Indoor Environment, A
* Recognition of Meaningful Human Actions for Video Annotation Using EEG Based User Responses
* Robust Attribute-Based Visual Recognition Using Discriminative Latent Representation
* Robust Multi-Label Image Classification with Semi-Supervised Learning and Active Learning
* Robust User Community-Aware Landmark Photo Retrieval
* Scaling and Cropping of Wavelet-Based Compressed Images in Hidden Domain
* Secure Client Side Watermarking with Limited Key Size
* Semantic Correlation Mining between Images and Texts with Global Semantics and Local Mapping
* Signal-Aware Parametric Quality Model for Audio and Speech over IP Networks
* Sliders Versus Storyboards: Investigating Interaction Design for Mobile Video Browsing
* SLOREV: Using Classical CAD Techniques for 3D Object Extraction from Single Photo
* Software Solution for HEVC Encoding and Decoding
* Sparsity-Based Occlusion Handling Method for Person Re-identification
* Storyboard-Based Interface for Mobile Video Browsing, A
* Study on the Use of a Binary Local Descriptor and Color Extensions of Local Descriptors for Video Concept Detection, A
* Surveillance Video Index and Browsing System Based on Object Flags and Video Synopsis, A
* Synchronization Ground Truth for the Jiku Mobile Video Dataset, A
* Text Detection in Natural Images Using Localized Stroke Width Transform
* Towards Consent-Based Lifelogging in Sport Analytic
* Travel Recommendation via Author Topic Model Based Collaborative Filtering
* Two-Dimensional Euler PCA for Face Recognition
* Unified Model for Socially Interconnected Multimedia-Enriched Objects, A
* User-Centred Evaluation to Interface Design of E-Books
* VERGE: A Multimodal Interactive Video Search Engine
* Visual Attention Driven by Auditory Cues
* Wearable Cameras for Real-Time Activity Annotation
* Web Portal for Effective Multi-model Exploration, A
* What Image Classifiers Really See: Visualizing Bag-of-Visual Words Models
* Wifbs: A Web-Based Image Feature Benchmark System
* Wireless Video Surveillance System Based on Incremental Learning Face Detection
110 for MMMod15

MMMod16 * *Advances in Multimedia Modeling
* 1D Barcode Region Detection Based on the Hough Transform and Support Vector Machine
* Adaptive Multichannel Reduction Using Convex Polyhedral Loudspeaker Array
* Adaptive Synopsis of Non-Human Primates' Surveillance Video Based on Behavior Classification
* Advancing Iterative Quantization Hashing Using Isotropic Prior
* Analysis and Comparison of Inter-Channel Level Difference and Interaural Level Difference
* Applying Visual User Interest Profiles for Recommendation and Personalisation
* Attribute Discovery for Person Re-Identification
* Automatic Endmember Extraction Using Pixel Purity Index for Hyperspectral Imagery
* Automatic Scribble Simulation for Interactive Image Segmentation Evaluation
* Bag Detection and Retrieval in Street Shots
* Camera Network Based Person Re-identification by Leveraging Spatial-Temporal Constraint and Multiple Cameras Relations
* Client-Driven Strategy of Large-Scale Scene Streaming
* Collaborative Q-Learning Based Routing Control in Unstructured P2P Networks
* Collaborative Video Search Combining Video Retrieval with Human-Based Visual Inspection
* Compound Figure Separation Combining Edge and Band Separator Detection
* Computational Cartoonist: A Comic-Style Video Summarization System for Anime Films
* Computational Face Reader
* Consensus Guided Multiple Match Removal for Geometry Verification in Image Retrieval
* Cross-Media Retrieval via Semantic Entity Projection
* Cross-Modal Fashion Search
* Dealing with Ambiguous Queries in Multimodal Video Retrieval
* Deep Learning Generic Features for Cross-Media Retrieval
* Depth Map Coding by Modeling the Locality and Local Correlation of View Synthesis Distortion in 3-D Video
* Describing Images with Ontology-Aware Dictionary Learning
* DFRS: A Large-Scale Distributed Fingerprint Recognition System Based on Redis
* Discriminant Manifold Learning via Sparse Coding for Image Analysis
* Discriminative Feature Learning with an Optimal Pattern Model for Image Classification
* Dominant Set Based Data Clustering and Image Segmentation
* Driver Fatigue Detection System Based on DSP Platform
* E2SGM: Event Enrichment and Summarization by Graph Model
* Edit-Based Font Search
* Effective Face Verification Algorithm to Fuse Complete Features in Convolutional Neural Network, An
* Efficient Perceptual Region Detector Based on Object Boundary
* Elastic Edge Boxes for Object Proposal on RGB-D Images
* Enhancement for Dust-Sand Storm Images
* Evaluating Access Mechanisms for Multimodal Representations of Lifelogs
* Exploring Discriminative Views for 3D Object Retrieval
* Exploring Relationship Between Face and Trustworthy Impression Using Mid-level Facial Features
* Exploring the Long Tail of Social Media Tags
* Extracting Visual Knowledge from the Internet: Making Sense of Image Data
* Face Image Super-Resolution Through Improved Neighbor Embedding
* Faceted Navigation for Browsing Large Video Collection
* Facial Age Estimation with Images in the Wild
* Fast 3D Indoor-Localization Approach Based on Video Queries, A
* Fast Nearest Neighbor Search in the Hamming Space
* Fast Visual Vocabulary Construction for Image Retrieval Using Skewed-Split k-d Trees
* Frame-Wise Continuity-Based Video Summarization and Stretching
* Global Contrast Based Salient Region Boundary Sampling for Action Recognition
* GrillCam: A Real-Time Eating Action Recognition System
* Group Feature Selection for Audio-Based Video Genre Classification
* iAutoMotion: an Autonomous Content-Based Video Retrieval Engine
* Image Classification Using Spatial Difference Descriptor Under Spatial Pyramid Matching Framework
* Image Retrieval Using Color-Aware Tag on Progressive Image Search and Recommendation System
* IMOTION: Searching for Video Sequences Using Multi-Shot Sketch Queries
* Improved RANSAC Image Stitching Algorithm Based Similarity Degree, An
* Informed Perspectives on Human Annotation Using Neural Signals
* Instance Search with Weak Geometric Correlation Consistency
* Interactive Search in Video: Navigation With Flick Gestures vs. Seeker-Bars
* Learning Hough Transform with Latent Structures for Joint Object Detection and Pose Estimation
* Learning Multiple Views with Orthogonal Denoising Autoencoders
* Learning Relative Aesthetic Quality with a Pairwise Approach
* Level Ratio Based Inter and Intra Channel Prediction with Application to Stereo Audio Frame Loss Concealment
* Locality Constrained Sparse Representation for Cat Recognition
* Location-Aware Image Classification
* LoggerMan, a Comprehensive Logging and Visualization Tool to Capture Computer Usage
* Logo Recognition via Improved Topological Constraint
* Mental Visual Browsing
* METU-MMDS: An Intelligent Multimedia Database System for Multimodal Content Extraction and Querying
* Multi-modal Image Re-ranking with Autoencoders and Click Semantics
* Multi-sketch Semantic Video Browser
* MusicMixer: Automatic DJ System Considering Beat and Latent Topic Similarity
* Navigating a Graph of Scenes for Exploring Large Video Collections
* NEWSMAN: Uploading Videos over Adaptive Middleboxes to News Servers in Weak Network Infrastructures
* No-reference Image Quality Assessment Based on Structural and Luminance Information
* Novel Emotional Saliency Map to Model Emotional Attention Mechanism, A
* OGB: A Distinctive and Efficient Feature for Mobile Augmented Reality
* Ordering of Visual Descriptors in a Classifier Cascade Towards Improved Video Concept Detection
* Packet Scheduling Method for Multimedia QoS Provisioning, A
* Pairing Contour Fragments for Object Recognition
* Personalized Annotation for Mobile Photos Based on User's Social Circle
* Posed and Spontaneous Expression Recognition Through Restricted Boltzmann Machine
* Private Video Foreground Extraction Through Chaotic Mapping Based Encryption in the Cloud
* Quality Analysis on Mobile Devices for Real-Time Feedback
* R-CNN Based Method to Localize Speech Balloons in Comics, An
* Real-Time Grayscale-Thermal Tracking via Laplacian Sparse Representation
* Respiration Motion State Estimation on 4D CT Rib Cage Images
* Reverse Testing Image Set Model Based Multi-view Human Action Recognition
* Robust Crowd Segmentation and Counting in Indoor Scenes
* Robust Object Tracking Using Valid Fragments Selection
* Robust Sketch-Based Image Retrieval by Saliency Detection
* Searching in Video Collections Using Sketches and Sample Images: The Cineast System
* Second-Layer Navigation in Mobile Hypervideo for Medical Training
* Selecting User Generated Content for Use in Media Productions
* SELSH: A Hashing Scheme for Approximate Similarity Search with Early Stop Condition
* Sentiment Analysis on Multi-View Social Data
* Shaping-Up Multimedia Analytics: Needs and Expectations of Media Professionals
* Sign Language Recognition Based on Trajectory Modeling with HMMs
* Single Image Super-Resolution via Convolutional Neural Network and Total Variation Regularization
* Sketch-Based Image Retrieval with a Novel BoVW Representation
* Smart Ambient Sound Analysis via Structured Statistical Modeling
* SOMH: A Self-Organizing Map Based Topology Preserving Hashing Method
* Spatial Constrained Fine-Grained Color Name for Person Re-identification
* Symmetry-Aware Human Shape Correspondence Using Skeleton
* Ten Research Questions for Scalable Multimedia Analytics
* Towards Training-Free Refinement for Semantic Indexing of Visual Media
* Transfer Nonnegative Matrix Factorization for Image Representation
* TV Commercial Detection Using Success Based Locally Weighted Kernel Combination
* User Profiling by Combining Topic Modeling and Pointwise Mutual Information (TM-PMI)
* Using Instagram Picture Features to Predict Users' Personality
* Utilizing Sensor-Social Cues to Localize Objects-of-Interest in Outdoor UGVs
* VERGE: A Multimodal Interactive Search Engine for Video Browsing and Retrieval
* Very Deep Sequences Learning Approach for Human Action Recognition, A
* Video Content Representation Using Recurring Regions Detection
* Video Event Detection Using Kernel Support Vector Machine with Isotropic Gaussian Sample Uncertainty (KSVM-iGSU)
* Videopedia: Lecture Video Recommendation for Educational Blogs Using Topic Modeling
* Visual Analyses of Music Download History: User Studies
* Visual Re-ranking Through Greedy Selection and Rank Fusion
* What are the Limits to Time Series Based Recognition of Semantic Concepts?
* What Catches Your Eyes as You Move Around? On the Discovery of Interesting Regions in the Street
* XTemplate 4.0: Providing Adaptive Layouts and Nested Templates for Hypermedia Documents
121 for MMMod16

MMMod17 * *Advances in Multimedia Modeling
* 3D Sound Field Reproduction at Non Central Point for NHK 22.2 System
* Adaptive and Optimal Combination of Local Features for Image Retrieval
* Annotation System for Egocentric Image Media, An
* Augmented Telemedicine Platform for Real-Time Remote Medical Consultation
* Binaural Sound Source Distance Reproduction Based on Distance Variation Function and Artificial Reverberation
* Boredom Recognition Based on Users' Spontaneous Behaviors in Multiparty Human-Robot Interactions
* CELoF: WiFi Dwell Time Estimation in Free Environment
* Classification of sMRI for AD Diagnosis with Convolutional Neuronal Networks: A Pilot 2-D+ epsilon Study on ADNI
* Collaborative Dictionary Learning and Soft Assignment for Sparse Coding of Image Features
* Collaborative Feature Maps for Interactive Video Search
* Color Consistency for Photo Collections Without Gamut Problems
* Color-Introduced Frame-to-Model Registration for 3D Reconstruction
* Compact CNN Based Video Representation for Efficient Video Copy Detection
* Comparative Study for Known Item Visual Search Using Position Color Feature Signatures, A
* Comparison of Approaches for Automated Text Extraction from Scholarly Figures, A
* Comparison of Fine-Tuning and Extension Strategies for Deep Convolutional Neural Networks
* Compressing Visual Descriptors of Image Sequences
* Concept-Based Interactive Search System
* Convolutional Neural Network Approach for Post-Processing in HEVC Intra Coding, A
* Cross-Modal Recipe Retrieval: How to Cook this Dish?
* Deep Convolutional Neural Network for Bidirectional Image-Sentence Mapping
* Deep Learning Based Intelligent Basketball Arena with Energy Image
* Deep Learning for Shot Classification in Gynecologic Surgery Videos
* DeepStyleCam: A Real-Time Style Transfer App on iOS
* Demo for Image-Based Personality Test, A
* Demographic Attribute Inference from Social Multimedia Behaviors: A Cross-OSN Approach
* Describing Geographical Characteristics with Social Images
* Description Logics and Rules for Multimodal Situational Awareness in Healthcare
* Discovering Geographic Regions in the City Using Social Multimedia and Open Data
* Discovering User Interests from Social Images
* Effect of Junk Images on Inter-concept Distance Measurement: Positive or Negative?
* Efficient Multi-scale Plane Extraction Based RGBD Video Segmentation
* egoPortray: Visual Exploration of Mobile Communication Signature from Egocentric Network Perspective
* Enhanced Retrieval and Browsing in the IMOTION System
* Evaluation of Video Browsing on Tablets with the ThumbBrowser, An
* Exploiting Multimodality in Video Hyperlinking to Improve Target Diversity
* Exploring Large Movie Collections: Comparing Visual Berrypicking and Traditional Browsing
* Facial Expression Recognition by Fusing Gabor and Local Binary Pattern Features
* Fine-Grained Image Recognition from Click-Through Logs Using Deep Siamese Network
* Frame-Independent and Parallel Method for 3D Audio Real-Time Rendering on Mobile Devices
* Framework of Privacy-Preserving Image Recognition for Image-Based Information Services, A
* Fully Convolutional Network with Superpixel Parsing for Fashion Web Image Segmentation
* Graph-Based Multimodal Music Mood Classification in Discriminative Latent Space
* Human Pose Tracking Using Online Latent Structured Support Vector Machine
* i-Stylist: Finding the Right Dress Through Your Social Networks
* Illumination-Preserving Embroidery Simulation for Non-photorealistic Rendering
* Improving the Discriminative Power of Bag of Visual Words Model
* Joint Face Detection and Initialization for Face Alignment
* Large-Scale Product Classification via Spatial Attention Based CNN Learning and Multi-class Regression
* Learning Features Robust to Image Variations with Siamese Networks for Facial Expression Recognition
* LingoSent: A Platform for Linguistic Aware Sentiment Analysis for Social Media Messages
* M-SBIR: An Improved Sketch-Based Image Retrieval Method Using Visual Word Mapping
* M3LH: Multi-modal Multi-label Hashing for Large Scale Data Search
* Micro-Expression Recognition by Aggregating Local Spatio-Temporal Patterns
* Model-Based 3D Scene Reconstruction Using a Moving RGB-D Camera
* Modeling User Performance for Moving Target Selection with a Delayed Mouse
* Movie Recommendation via BLSTM
* Multi-attribute Based Fire Detection in Diverse Surveillance Videos
* Multi-Task Multi-modal Semantic Hashing for Web Image Retrieval with Limited Supervision
* Multimodal Video-to-Video Linking: Turning to the Crowd for Insight and Evaluation
* Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
* No-Reference Image Quality Assessment Based on Internal Generative Mechanism
* Novel Affective Visualization System for Videos Based on Acoustic and Visual Features, A
* Novel Two-Step Integer-pixel Motion Estimation Algorithm for HEVC Encoding on a GPU, A
* Object-Based Aggregation of Deep Features for Image Retrieval
* On the Exploration of Convolutional Fusion Networks for Visual Recognition
* Online User Modeling for Interactive Streaming Image Classification
* Perceptual Lossless Quantization of Spatial Parameter for 3D Audio Signals, The
* Phase Fourier Reconstruction for Anomaly Detection on Metal Surface Using Salient Irregularity
* Real-Time 3D Visual Singing Synthesis: From Appearance to Internal Articulators, A
* Recognizing Emotions Based on Human Actions in Videos
* ReMagicMirror: Action Learning Using Human Reenactment with the Mirror Metaphor
* Robust Image Classification via Low-Rank Double Dictionary Learning
* Robust Scene Text Detection for Multi-script Languages Using Deep Learning
* Robust Visual Tracking Based on Multi-channel Compressive Features
* Rocchio-Based Relevance Feedback in Video Event Retrieval
* Scalable Video Conferencing System Using Cached Facial Expressions, A
* Scale-Relation Feature for Moving Cast Shadow Detection
* Semantic Extraction and Object Proposal for Video Search
* Sensor-Based Official Basketball Referee Signals Recognition System Using Deep Belief Networks, A
* Single Image Super-Resolution with a Parameter Economic Residual-Like Convolutional Neural Network
* Smart Loudspeaker Arrays for Self-Coordination and User Tracking
* Spatial Verification via Compact Words for Mobile Instance Search
* Spatio-Temporal VLAD Encoding for Human Action Recognition in Videos
* Speech Synchronized Tongue Animation by Combining Physiology Modeling and X-ray Image Fitting
* Stochastic Decorrelation Constraint Regularized Auto-Encoder for Visual Recognition
* Storyboard-Based Video Browsing Using Color and Concept Indices
* Structural Coupled-Layer Tracking Method Based on Correlation Filters, A
* Structure-Aware Image Resizing for Chinese Characters
* Supervised Class Graph Preserving Hashing for Image Retrieval and Classification
* Understanding Performance of Edge Prefetching
* Unified Framework for Monocular Video-Based Facial Motion Tracking and Expression Recognition, A
* Unsupervised Multiple Object Cosegmentation via Ensemble MIML Learning
* User Identification by Observing Interactions with GUIs
* Using Object Detection, NLP, and Knowledge Bases to Understand the Message of Images
* Utilizing Locality-Sensitive Hash Learning for Cross-Media Retrieval
* Uyghur Language Text Detection in Complex Background Images Using Enhanced MSERs
* V-Head: Face Detection and Alignment for Facial Augmented Reality Applications
* VERGE in VBS 2017
* Video Hunter at VBS 2017
* Video Search via Ranking Network with Very Few Query Exemplars
* Virtual Reality Framework for Multimodal Imagery for Vessels in Polar Regions, A
* Visual Robotic Object Grasping Through Combining RGB-D Data and 3D Meshes
* Web-Based Service for Disturbing Image Detection, A
* What are Good Design Gestures?
* What Convnets Make for Image Captioning?
107 for MMMod17

MMMod18 * *Advances in Multimedia Modeling
* *Advances in Multimedia Modeling
* Accurate Detection for Scene Texts with a Cascaded CNN Networks
* ActionVis: An Explorative Tool to Visualize Surgical Actions in Gynecologic Laparoscopy
* Adaptive Image Representation Using Information Gain and Saliency: Application to Cultural Heritage Datasets
* AGO: Accelerating Global Optimization for Accurate Stereo Matching
* Approaches for Event Segmentation of Visual Lifelog Data
* AR DeepCalorieCam: An iOS App for Food Calorie Estimation with Augmented Reality
* Auto Accessory Segmentation and Interactive Try-on System
* Automatic Smoke Classification in Endoscopic Video
* CAMETRON Lecture Recording System: High Quality Video Recording and Editing with Minimal Human Supervision, The
* Category Specific Post Popularity Prediction
* Cloud of Line Distribution and Random Forest Based Text Detection from Natural/Video Scene Images
* CNN-Based DCT-Like Transform for Image Compression
* Co-occurrent Structural Edge Detection for Color-Guided Depth Map Super-Resolution
* Coarse-to-Fine Image Super-Resolution Using Convolutional Neural Networks
* Collision-Free LSTM for Human Trajectory Prediction
* Competitive Video Retrieval with VITRIVR
* Convolution with Logarithmic Filter Groups for Efficient Shallow CNN
* Cost-Sensitive Deep Metric Learning for Fine-Grained Image Classification
* Crowd Distribution Estimation with Multi-scale Recursive Convolutional Neural Network
* Data Augmentation for EEG-Based Emotion Recognition with Deep Convolutional Neural Networks
* Deep Convolutional Neural Network for Correlating Images and Sentences
* Deep Pedestrian Detection Using Contextual Information and Multi-level Features
* Depth Representation of LiDAR Point Cloud with Adaptive Surface Patching for Object Classification
* Domain Invariant Subspace Learning for Cross-Modal Retrieval
* Dual-Way Guided Depth Image Inpainting with RGBD Image Pairs
* Effective Action Detection Using Temporal Context and Posterior Probability of Length
* Efficient and Interactive Spatial-Semantic Image Retrieval
* Efficient Two-Layer Model Towards Cover Song Identification
* Enhanced VIREO KIS at VBS 2018
* Evaluation of Visual Content Descriptors for Supporting Ad-Hoc Video Search Tasks at the Video Browser Showdown
* Find Me a Sky: A Data-Driven Method for Color-Consistent Sky Search and Replacement
* Font Recognition in Natural Images via Transfer Learning
* Food Photo Recognition for Dietary Tracking: System and Experiment
* Frame-Based Classification of Operation Phases in Cataract Surgery Videos
* Fusing Keyword Search and Visual Exploration for Untagged Videos
* Fusion Networks for Air-Writing Recognition
* Global and Local C3D Ensemble System for First Person Interactive Action Recognition
* High-Precision 3D Coarse Registration Using RANSAC and Randomly-Picked Rejections
* Image Aesthetic Distribution Prediction with Fully Convolutional Network
* Image Aesthetics and Content in Selecting Memorable Keyframes from Lifelogs
* ImageX - Explore and Search Local/Private Images
* Implicit Affective Video Tagging Using Pupillary Response
* Improving the Quality of Video-to-Language Models by Optimizing Annotation of the Training Material
* ITEC Collaborative Video Search System at the Video Browser Showdown 2018, The
* Iterative Active Classification of Large Image Collection
* k-Labelsets for Multimedia Classification with Global and Local Label Correlation
* Learning to Index in Large-Scale Datasets
* Lifelog Exploration Prototype in Virtual Reality
* Light Field Foreground Matting Based on Defocus and Correspondence
* LOCO: Local Context Based Faster R-CNN for Small Traffic Sign Detection
* Long Tail of Web Video, The
* LVFS: A Lightweight Video Storage File System for IP Camera-Based Surveillance Applications
* Markov Network Based Passage Retrieval Method for Multimodal Question Answering in the Cultural Heritage Domain, A
* Method of Weather Radar Echo Extrapolation Based on Convolutional Neural Networks, A
* Motion-Driven Approach for Fine-Grained Temporal Segmentation of User-Generated Videos, A
* Multi-camera Microenvironment to Capture Multi-view Time-Lapse Videos for 3D Analysis of Aging Objects
* Multi-hypothesis-Based Error Concealment for Whole Frame Loss in HEVC
* Multi-stream Fusion Model for Social Relation Recognition from Videos
* Multimodal Augmented Reality: Augmenting Auditory-Tactile Feedback to Change the Perception of Thickness
* New Accurate Image Denoising Method Based on Sparse Coding Coefficients, A
* Novel 3D Human Action Recognition Framework for Video Content Analysis, A
* Novel Frontal Facial Synthesis Algorithm Based on Individual Residual Face, A
* On the Traceability of Results from Deep Learning-Based Cloud Services
* Ontlus: 3D Content Collaborative Creation via Virtual Reality
* Parameter Selection for Denoising Algorithms Using NR-IQA with CNN
* Person Re-id by Incorporating PCA Loss in CNN
* Programmatic 3D Printing of a Revolving Camera Track to Automatically Capture Dense Images for 3D Scanning of Objects
* Real-Time Polyps Segmentation for Colonoscopy Video Frames Using Compressed Fully Convolutional Network
* Recursive Pyramid Network with Joint Attention for Cross-Media Retrieval
* Reinforcing Pedestrian Parsing on Small Scale Dataset
* Remote Sensing Image Fusion Based on Two-Stream Fusion Network
* Rethinking Summarization and Storytelling for Modern Social Multimedia
* Revisiting SIRET Video Retrieval Tool
* REVT: Robust and Efficient Visual Tracking by Region-Convolutional Regression Network
* RNN-Based Speech-Music Discrimination Used for Hybrid Audio Coder, An
* Robust and Real-Time Visual Tracking Based on Complementary Learners
* Room Floor Plan Generation on a Project Tango Device
* Scalable Bag of Selected Deep Features for Visual Instance Retrieval
* SeqSense: Video Recommendation Using Topic Sequence Mining
* Shallow-Water Image Enhancement Using Relative Global Histogram Stretching Based on Adaptive Parameter Acquisition
* ShapeCreator: 3D Shape Generation from Isomorphic Datasets Based on Autoencoder
* Sketch-Based Similarity Search for Collaborative Feature Maps
* Sloth Search System
* Source Distortion Estimation for Wyner-Ziv Distributed Video Coding
* Spatiotemporal 3D Models of Aging Fruit from Multi-view Time-Lapse Videos
* SRN: The Movie Character Relationship Analysis via Social Network
* Stitch-Based Image Stylization for Thread Art Using Sparse Modeling
* Teacher and Student Joint Learning for Compact Facial Landmark Detection Network
* Text Image Deblurring via Intensity Extremums Prior
* Text Recognition and Retrieval System for e-Business Image Management, A
* Towards Demographic-Based Photographic Aesthetics Prediction for Portraitures
* Triplet Convolutional Network for Music Version Identification
* Two-Level Segment-Based Bitrate Control for Live ABR Streaming
* Uyghur Text Localization with Fast Component Detection
* Vehicle Semantics Extraction and Retrieval for Long-Term Carpark Video Surveillance
* Venue Prediction for Social Images by Exploiting Rich Temporal Patterns in LBSNs
* VERGE in VBS 2018
* Video Browsing on a Circular Timeline
* Video Search Based on Semantic Extraction and Locally Regional Object Proposal
* Virtual Reality Interface for Interactions with Spatiotemporal 3D Data, A
102 for MMMod18

MMMod19 * 3D Object Completion via Class-Conditional Generative Adversarial Network
* 3D ResNets for 3D Object Classification
* 3D Skeletal Gesture Recognition via Sparse Coding of Time-Warping Invariant Riemannian Trajectories
* Accelerating Topic Detection on Web for a Large-Scale Data Set via Stochastic Poisson Deconvolution
* Action Recognition Using Visual Attention with Reinforcement Learning
* Adaptive Alignment Network for Person Re-identification
* Adversarial Training for Video Disentangled Representation
* Alignment of Deep Features in 3D Models for Camera Pose Estimation
* Athens Urban Soundscape (ATHUS): A Dataset for Urban Soundscape Quality Recognition
* Audio-Based Automatic Generation of a Piano Reduction Score by Considering the Musical Structure
* Audiovisual Annotation Procedure for Multi-view Field Recordings
* Automatic Classification and Linguistic Analysis of Extremist Online Material
* Automatic Segmentation of Brain Tumor Image Based on Region Growing with Co-constraint
* Automatic System for Generating Artificial Fake Character Images, An
* Autopiloting Feature Maps: The Deep Interactive Video Exploration (diveXplore) System at VBS2019
* Bag of Deep Features for Instructor Activity Recognition in Lecture Room
* Challenges in Audio Processing of Terrorist-Related Data
* Character Prediction in TV Series via a Semantic Projection Network
* CNN-Based Non-contact Detection of Food Level in Bottles from RGB Images
* DANTE Speaker Recognition Module. An Efficient and Robust Automatic Speaker Searching Solution for Terrorism-Related Scenarios
* Deep Hashing with Triplet Labels and Unification Binary Code Selection for Fast Image Retrieval
* Deep Learning-Based Concept Detection in vitrivr
* Deep Neural Network Based 3D Articulatory Movement Prediction Using Both Text and Audio Inputs
* Deep Recurrent Neural Network for Multi-target Filtering
* Deep Reinforcement Learning for Automatic Thumbnail Generation
* Detail-Preserving Trajectory Summarization Based on Segmentation and Group-Based Filtering
* Detecting Tampered Videos with Multimedia Forensics and Deep Learning
* Early Identification of Oil Spills in Satellite Images Using Deep CNNs
* ECAT: Endoscopic Concept Annotation Tool
* Effective Dual-Fisheye Lens Stitching Method Based on Feature Points, An
* Efficient Graph Based Multi-view Learning
* Enhancing Scene Text Detection via Fused Semantic Segmentation Network with Attention
* evolve2vec: Learning Network Representations Using Temporal Unfolding
* Exploiting Incidence Relation Between Subgroups for Improving Clustering-Based Recommendation Model
* Exploring the Impact of Training Data Bias on Automatic Generation of Video Captions
* Face Swapping for Solving Collateral Privacy Issues in Multimedia Analytics
* Fashion Police: Towards Semantic Indexing of Clothing Information in Surveillance Data
* Fontender: Interactive Japanese Text Design with Dynamic Font Fusion Method for Comics
* Four Models for Automatic Recognition of Left and Right Eye in Fundus Images
* Foveated Ray Tracing for VR Headsets
* From Classical to Generalized Zero-Shot Learning: A Simple Adaptation Process
* From Movement to Events: Improving Soccer Match Annotations
* Gated Recurrent Capsules for Visual Word Embeddings
* Generative Adversarial Networks with Enhanced Symmetric Residual Units for Single Image Super-Resolution
* Genetic Programming Approach to Integrate Multilayer CNN Features for Image Classification, A
* Greedy Salient Dictionary Learning for Activity Video Summarization
* Hierarchical Bayesian Network Based Incremental Model for Flood Prediction
* Hierarchical Level Set Approach to for RGBD Image Matting, A
* Hierarchical Temporal Pooling for Efficient Online Action Recognition
* Hierarchical Vision-Language Alignment for Video Captioning
* Identifying Terrorism-Related Key Actors in Multidimensional Social Networks
* Image Aesthetics Assessment Using Fully Convolutional Neural Networks
* Impact of Packet Loss and Google Congestion Control on QoE for WebRTC-Based Mobile Multiparty Audiovisual Telemeetings, The
* Improving Micro-expression Recognition Accuracy Using Twofold Feature Extraction
* Improving Robustness of Image Tampering Detection for Compression
* Incremental Training for Face Recognition
* Integration of Exploration and Search: A Case Study of the M3 Model
* Joint EPC and RAN Caching of Tiled VR Videos for Mobile Networks
* Joint Visual-Textual Sentiment Analysis Based on Cross-Modality Attention Mechanism
* Large Scale Audio-Visual Video Analytics Platform for Forensic Investigations of Terroristic Attacks
* Method for Enriching Video-Watching Experience with Applied Effects Based on Eye Movements, A
* Multi-channel Convolutional Neural Networks with Multi-level Feature Fusion for Environmental Sound Classification
* Multimodal Video Annotation for Retrieval and Discovery of Newsworthy Video in a News Verification Scenario
* Near-Duplicate Video Retrieval Through Toeplitz Kernel Partial Least Squares
* Neuropsychiatric Disorders Identification Using Convolutional Neural Network
* New Female Body Segmentation and Feature Localisation Method for Image-Based Anthropometry, A
* New Hybrid Architecture for Human Activity Recognition from RGB-D Videos, A
* No-Reference Video Quality Assessment Based on Ensemble of Knowledge and Data-Driven Models
* On the Unsolved Problem of Shot Boundary Detection for Music Videos
* Person Re-Identification Based on Pose-Aware Segmentation
* Personalized Recommendation of Photography Based on Deep Learning
* Photo-Realistic Facial Emotion Synthesis Using Multi-level Critic Networks with Multi-level Generative Model
* Point Cloud Colorization Based on Densely Annotated 3D Shape Dataset
* Poses Guide Spatiotemporal Model for Vehicle Re-identification
* Preferred Model of Adaptation to Dark for Virtual Reality Headsets
* Proposal of an Annotation Method for Integrating Musical Technique Knowledge Using a GTTM Time-Span Tree
* psDirector: An Automatic Director for Watching View Generation from Panoramic Soccer Video
* Query-by-Dancing: A Dance Music Retrieval System Based on Body-Motion Similarity
* Realtime Human Segmentation in Video
* Regular and Small Target Detection
* Reliability Object Layer for Deep Hashing-Based Visual Indexing, A
* Representation of Speech in Deep Neural Networks, The
* Robust Multi-Athlete Tracking Algorithm by Exploiting Discriminant Features and Long-Term Dependencies, A
* SCOD: Dynamical Spatial Constraints for Object Detection
* Semantic Knowledge Discovery Framework for Detecting Online Terrorist Networks, A
* Semantic Map Annotation Through UAV Video Analysis Using Deep Learning Models in ROS
* Sentiment-Aware Multi-modal Recommendation on Tourist Attractions
* SEPHLA: Challenges and Opportunities Within Environment-Personal Health Archives
* Single-Stage Detector with Semantic Attention for Occluded Pedestrian Detection
* Soccer Video Event Detection Based on Deep Learning
* Space Wars: An AugmentedVR Game
* Spatio-Temporal Attention Model Based on Multi-view for Social Relation Understanding
* Spectral Tilt Estimation for Speech Intelligibility Enhancement Using RNN Based on All-Pole Model
* STMP: Spatial Temporal Multi-level Proposal Network for Activity Detection
* Subjective Visual Quality Assessment of Immersive 3D Media Compressed by Open-Source Static 3D Mesh Codecs
* Task-Driven Biometric Authentication of Users in Virtual Reality (VR) Environments
* Temporal Action Localization Based on Temporal Evolution Model and Multiple Instance Learning
* Temporal Lecture Video Fragmentation Using Word Embeddings
* Test Collection for Interactive Lifelog Retrieval, A
* Training Researchers with the MOVING Platform
* Two-Level Attention with Multi-task Learning for Facial Emotion Estimation
* Understanding Intonation Trajectories and Patterns of Vocal Notes
* User Interaction for Visual Lifelog Retrieval in a Virtual Environment
* Using Coarse Label Constraint for Fine-Grained Visual Classification
* Utilizing Deep Object Detector for Video Surveillance Indexing and Retrieval
* V3C: A Research Video Collection
* VERGE in VBS 2019
* Video Summarization with LSTM and Deep Attention Models
* Violin Timbre Navigator: Real-Time Visual Feedback of Violin Bowing Based on Audio Analysis and Machine Learning
* VIREO@Video Browser Showdown 2019
* VIRET Tool Meets NasNet
* VISIONE at VBS2019
* Visual Urban Perception with Deep Semantic-Aware Network
113 for MMMod19

MMMod20 * *Advances in Multimedia Modeling
* 3-d Oral Shape Retrieval Using Registration Algorithm
* 3d Spatial Coverage Measurement of Aerial Images
* Action Co-localization in an Untrimmed Video by Graph Neural Networks
* Adversarial Query-by-image Video Retrieval Based on Attention Mechanism
* Attennet: Deep Attention Based Retinal Disease Classification in OCT Images
* Attention Based Speaker-independent Audio-visual Deep Learning Model for Speech Enhancement, An
* Automatic Material Classification Using Thermal Finger Impression
* Background Segmentation for Vehicle Re-identification
* Baseline Analysis of a Conventional and Virtual Reality Lifelog Retrieval System
* Beyond Literal Visual Modeling: Understanding Image Metaphor Based on Literal-implied Concept Mapping
* Browsing Visual Sentiment Datasets Using Psycholinguistic Groundings
* Cartoonrenderer: An Instance-based Multi-style Cartoon Image Translator
* Classroom Attention Analysis Based on Multiple Euler Angles Constraint and Head Pose Estimation
* CNN-based Multi-scale Super-resolution Architecture on FPGA for 4k/8k Uhd Applications, A
* Combining Boolean and Multimedia Retrieval in VITRIVR for Large-scale Video Search
* Compact Deep Neural Network for Single Image Super-resolution, A
* Compact Position-aware Attention Network for Image Semantic Segmentation
* Content-aware Cubemap Projection for Panoramic Image via Deep Q-learning
* Context-aware Residual Network with Promotion Gates for Single Image Super-resolution
* Cross Fusion for Egocentric Interactive Action Recognition
* Crowd Knowledge Enhanced Multimodal Conversational Assistant in Travel Domain
* Deep Convolutional Deblurring and Detection Neural Network for Localizing Text in Videos, A
* Deep Learning-based Video Retrieval Using Object Relationships and Associated Audio Classes
* Deep Palette-based Color Decomposition for Image Recoloring with Aesthetic Suggestion
* Deepstroke: Understanding Glyph Structure with Semantic Segmentation and Tabu Search
* Deformed Phase Prediction Using Svm for Structured Light Depth Generation
* Delay-aware Adaptation Framework for Cloud Gaming Under the Computation Constraint of User Devices, A
* DIME: An Online Tool for the Visual Comparison of Cross-modal Retrieval Models
* Distinct Synthesizer Convolutional Tasnet for Singing Voice Separation, A
* diveXplore 4.0: The Itec Deep Interactive Video Exploration System at VBS2020
* Down-sampling Based Video Coding with Degradation-aware Restoration-reconstruction Deep Neural Network
* Effective Barcode Hunter via Semantic Segmentation in the Wild
* Effective Utilization of Hybrid Residual Modules in Deep Neural Networks for Super Resolution
* Effective Way to Boost Black-box Adversarial Attack, An
* Efficient Algorithm of Facial Expression Recognition by Tsg-rnn Network, An
* Efficient Edge Caching for High-quality 360-degree Video Delivery
* Efficient Encoding Method for Video Compositing in HEVC, An
* Efficient Hevc Downscale Transcoding Based on Coding Unit Information Mapping
* Efficient Hierarchical Near-duplicate Video Detection Algorithm Based on Deep Semantic Features, An
* Emotion Recognition with Facial Landmark Heatmaps
* Enhanced Gaze Following via Object Detection and Human Pose Estimation
* Eulerian Motion Based 3dcnn Architecture for Facial Micro-expression Recognition
* Evaluating the Generalization Performance of Instrument Classification in Cataract Surgery Videos
* Experiences and Insights from the Collection of a Novel Multimedia EEG Dataset
* Exploiting the Importance of Personalization When Selecting Music for Relaxation
* Exquisitor at the Video Browser Showdown 2020
* Extensible Framework for Interactive Real-time Visualizations of Large-scale Heterogeneous Multimedia Information from Online Sources, An
* Extraction of Multi-class Multi-instance Geometric Primitives from Point Clouds Using Energy Minimization
* Face Attributes Recognition Based on One-way Inferential Correlation Between Attributes
* Face Super-resolution by Learning Multi-view Texture Compensation
* Face Tells Detailed Expression: Generating Comprehensive Facial Expression Sentence Through Facial Action Units
* Facial Expression Restoration Based on Improved Graph Convolutional Networks
* Fine-grain Level Sports Video Search Engine
* Framework Design for Multiplayer Motion Sensing Game in Mixture Reality
* Furcanext: End-to-end Monaural Speech Separation with Dynamic Gated Dilated Temporal Convolutional Networks
* GEN-RES-NET: A Novel Generative Model for Singing Voice Separation
* Generate Images with Obfuscated Attributes for Private Image Classification
* Glenda: Gynecologic Laparoscopy Endometriosis Dataset
* Global Affective Video Content Regression Based on Complementary Audio-visual Features
* Guided Refine-head for Object Detection
* High Accuracy Perceptual Video Hashing via Low-rank Decomposition and DWT
* HMM-based Person Re-identification in Large-scale Open Scenario
* HRTF Representation with Convolutional Auto-encoder
* Illumination Insensitive and Structure-aware Image Color Layer Decomposition Method, An
* Image Captioning Based on Visual and Semantic Attention
* Improved Model Structure with Cosine Margin OIM Loss for End-to-end Person Search
* Improving Brain Tumor Segmentation with Dilated Pseudo-3d Convolution and Multi-direction Fusion
* Improving Just Noticeable Difference Model by Leveraging Temporal HVS Perception Characteristics
* Inferring Emphasis for Real Voice Data: An Attentive Multimodal Neural Network Approach
* Inspherenet: A Concise Representation and Classification Method for 3d Object
* Instance Image Retrieval with Generative Adversarial Training
* Instrument Recognition in Laparoscopy for Technical Skill Assessment
* Interactive Search and Exploration in Discussion Forums Using Multimodal Embeddings
* Interactive Video Search Platform for Multi-modal Retrieval with Advanced Concepts, An
* Inverse Mapping with Manifold Alignment for Zero-shot Learning, An
* IVIST: Interactive Video Search Tool in VBS 2020
* Joint Sketch-attribute Learning for Fine-grained Face Synthesis
* K-SVD Based Point Cloud Coding for RGB-D Video Compression Using 3D Super-point Clustering
* Korean Sign Language Dataset for Action Recognition, The
* Kvasir-seg: A Segmented Polyp Dataset
* Law Is Order: Protecting Multimedia Network Transmission by Game Theory and Mechanism Design
* LDSNE: Learning Structural Network Embeddings by Encoding Local Distances
* Learning Multi-feature Based Spatially Regularized and Scale Adaptive Correlation Filters for Visual Tracking
* Light Field Reconstruction Using Dynamically Generated Filters
* Light Field Salient Object Detection via Hybrid Priors
* Lite Hourglass Network for Multi-person Pose Estimation
* Lyrics-conditioned Neural Melody Generation
* Marine Biometric Recognition Algorithm Based on YOLOV3-GAN Network
* Meta Transfer Learning for Adaptive Vehicle Tracking in UAV Videos
* Model-based and Class-based Fusion of Multisensor Data
* More-natural Mimetic Words Generation for Fine-grained Gait Description
* Multi-branch Body Region Alignment Network for Person Re-identification
* Multi-condition Place Generator for Robust Place Recognition
* Multi-data UAV Images for Large Scale Reconstruction of Buildings
* Multi-Hop Interactive Cross-modal Retrieval
* Multi-scale Comparison Network for Few-shot Learning
* Multi-scale Spatial Location Preference for Semantic Segmentation
* Multi-step Coding Structure of Spatial Audio Object Coding
* Multimedia Analytics Challenges and Opportunities for Creating Interactive Radio Content
* New Local Transformation Module for Few-shot Segmentation, A
* No Reference Image Quality Assessment by Information Decomposition
* Nova: A Tool for Explanatory Multimodal Behavior Analysis and Its Application to Psychotherapy
* Novel Attention Enhanced Dense Network for Image Super-resolution, A
* Omnieyes: Analysis and Synthesis of Artistically Painted Eyes
* On Creating Multimedia Interfaces for Hybrid Biological-digital Art Installations
* One-shot Face Recognition with Feature Rectification via Adversarial Learning
* Perceptual Localization of Virtual Sound Source Based on Loudspeaker Triplet
* Prediction-error Value Ordering for High-fidelity Reversible Data Hiding
* Prime: Block-wise Missingness Handling for Multi-modalities in Intelligent Tutoring Systems
* Rational Delegation Computing Using Information Theory and Game Theory Approach
* Real-time Demonstration of Personal Audio and 3d Audio Rendering Using Line Array Systems
* Real-time Multiple Pedestrians Tracking in Multi-camera System
* Real-time Recognition of Daily Actions Based on 3d Joint Movements and Fisher Encoding
* Region Based Adversarial Synthesis of Facial Action Units
* Relation Modeling with Graph Convolutional Networks for Facial Action Unit Detection
* Resolution Booster: Global Structure Preserving Stitching Method for Ultra-high Resolution Image Translation
* Rethinking the Test Collection Methodology for Personal Self-tracking Data
* Robust RGB-D Data Registration Based on Correntropy and Bi-directional Distance
* SEE-LPR: A Semantic Segmentation Based End-to-end System for Unconstrained License Plate Detection and Recognition
* Semantic and Morphological Information Guided Chinese Text Classification
* Similarity Graph Convolutional Construction Network for Interactive Action Recognition
* Single View Depth Estimation via Dense Convolution Network with Self-supervision
* Som-hunter: Video Browsing with Relevance-to-som Feedback Loop
* Speaker-aware Speech Emotion Recognition by Fusing Amplitude and Phase Information
* Structural Pyramid Network for Cascaded Optical Flow Estimation
* Structured Neural Motifs: Scene Graph Parsing via Enhanced Context
* Studying Public Medical Images from the Open Access Literature and Social Networks for Model Training and Knowledge Extraction
* Subclass Deep Neural Networks: Re-enabling Neglected Classes in Deep Network Training for Multimedia Classification
* Texture-based Fast Cu Size Decision and Intra Mode Decision Algorithm for Vvc
* Thermal Face Recognition Based on Transformation by Residual U-net and Pixel Shuffle Upsampling
* Tk-text: Multi-shaped Scene Text Detection via Instance Segmentation
* Towards Accurate Panel Detection in Manga: A Combined Effort of Cnn and Heuristics
* Unsupervised Feature Propagation for Fast Video Object Detection Using Generative Adversarial Networks
* Unsupervised Video Summarization via Attention-driven Adversarial Learning
* Verge in VBS 2020
* VHS to HDTV Video Translation Using Multi-task Adversarial Learning
* Vireo @ Video Browser Showdown 2020
* Viret at Video Browser Showdown 2020
* Visual Sentiment Analysis by Leveraging Local Regions and Human Faces
* Web-based Visualization Tool for 3d Spatial Coverage Measurement of Aerial Images, A
* Wonderful Clips of Playing Basketball: A Database for Localizing Wonderful Actions
142 for MMMod20

MMMod21 * *Advances in Multimedia Modeling
* Acceleration Framework for Super-resolution Network via Region Difficulty Self-adaption, An
* Adaptive Face-Iris Multimodal Identification System Based on Quality Assessment Network, An
* Asymmetric Two-sided Penalty Term for CT-GAN, An
* Atypical Lyrics Completion Considering Musical Audio Signals
* Automatic Diagnosis of Glaucoma on Color Fundus Images Using Adaptive Mask Deep Network
* Automatic Pose Quality Assessment for Adaptive Human Pose Refinement
* Canopy Height Estimation from Spaceborne Imagery Using Convolutional Encoder-decoder
* Catmeows: A Publicly-available Dataset of Cat Vocalizations
* Classifier Belief Optimization for Visual Categorization
* Collaborative Multi-modal Fusion Method Based on Random Variational Information Bottleneck for Gesture Recognition, A
* Competitive Interactive Video Retrieval in Virtual Reality with VITRIVR-VR
* Confidence-based Global Attention Guided Network for Image Inpainting
* Considering Human Perception and Memory in Interactive Multimedia Retrieval Evaluations
* Contrastive Learning in Frequency Domain for Non-I.I.D. Image Classification
* Crossed-time Delay Neural Network for Speaker Recognition
* Danet: Deformable Alignment Network for Video Inpainting
* Deep 3d Modeling of Human Bodies from Freehand Sketching
* Deep Attributed Network Embedding with Community Information
* Deep Centralized Cross-modal Retrieval
* Deep Face Swapping via Cross-identity Adversarial Training
* Deepfusion: Deep Ensembles for Domain Independent System Fusion
* Dense Attention-Guided Network for Boundary-Aware Salient Object Detection
* Discriminative and Selective Pseudo-labeling for Domain Adaptation
* DVRCNN: Dark Video Post-processing Method for VVC
* EEG Emotion Recognition Based on Channel Attention for E-healthcare Applications
* Efficient Image Transmission Pipeline for Multimedia Services, An
* Exquisitor at the Video Browser Showdown 2021: Relationships Between Semantic Classifiers
* Fast Discrete Matrix Factorization Hashing for Large-scale Cross-modal Retrieval
* Fast Mode Decision Algorithm for Intra Encoding of the 3rd Generation Audio Video Coding Standard
* Fast Optimal Transport Artistic Style Transfer
* Few-shot Learning with Unlabeled Outlier Exposure
* Fine-grained Generation for Zero-shot Learning
* Fine-grained Image-text Retrieval via Complementary Feature Learning
* Fine-grained Video Deblurring with Event Camera
* Frame Aggregation and Multi-Modal Fusion Framework for Video-Based Person Recognition
* Fusion of Multimodal Sensor Data for Effective Human Action Recognition in the Service of Medical Platforms
* Game Input with Delay: A Model of the Time Distribution for Selecting a Moving Target with a Mouse
* Gaussian Mixture Model Based Semi-supervised Sparse Representation for Face Recognition
* Generative Image Inpainting by Hybrid Contextual Attention Network
* Global Cognition and Local Perception Network for Blind Image Deblurring
* Graph Structure Reasoning Network for Face Alignment and Reconstruction
* Graph-based Indexing and Retrieval of Lifelog Data
* Group Activity Recognition by Exploiting Position Distribution and Appearance Relation
* Htad: A Home-tasks Activities Dataset with Wrist-accelerometer and Audio Features
* Hybrid Music Recommendation Algorithm Based on Attention Mechanism, A
* Illuminate Low-Light Image via Coarse-to-Fine Multi-Level Network
* Image Registration Improved by Generative Adversarial Networks
* Implementation of a Random Forest Classifier to Examine Wildfire Predictive Modelling in Greece Using Diachronically Collected Fire Occurrence and Fire Mapping Data
* Improving Supervised Cross-modal Retrieval with Semantic Graph Embedding
* Initialize with Mask: For More Efficient Federated Learning
* Interactive Video Search Tool: A Case Study Using the V3C1 Dataset, An
* IVIST: Interactive Video Search Tool in VBS 2021
* IVOS - The ITEC Interactive Video Object Search System at VBS2021
* Keystroke Dynamics as Part of Lifelogging
* Kvasir-instrument: Diagnostic and Therapeutic Tool Segmentation Dataset in Gastrointestinal Endoscopy
* Language Person Search with Pair-based Weighting Loss
* Learning 3D-Craft Generation with Predictive Action Neural Network
* Learning from the Negativity: Deep Negative Correlation Meta-learning for Adversarial Image Classification
* Learning Multi-level Interaction Relations and Feature Representations for Group Activity Recognition
* Less is More: Divexplore 5.0 at VBS 2021
* Locating Visual Explanations for Video Question Answering
* Median-Pooling Grad-Cam: An Efficient Inference Level Visual Explanation for CNN Networks in Remote Sensing Image Classification
* MM-net: Learning Adaptive Meta-metric for Few-shot Biometric Recognition
* MNR-AIR: An Economic and Dynamic Crowdsourcing Mechanism to Collect Personal Lifelog and Surrounding Environment Dataset. A Case Study in Ho Chi Minh City, Vietnam
* Mobile ehealth Platform for Home Monitoring of Bipolar Disorder
* Moviewall: A New Interface for Browsing Large Video Collections, The
* Mscanet: Adaptive Multi-scale Context Aggregation Network for Congested Crowd Counting
* Multi-branch and Multi-scale Attention Learning for Fine-grained Visual Categorization
* Multi-grained Fusion for Conditional Image Retrieval
* Multi-granularity Recurrent Attention Graph Neural Network for Few-shot Learning
* Multi-level Gate Feature Aggregation with Spatially Adaptive Batch-instance Normalization for Semantic Image Synthesis
* Multi-task Deep Learning for No-reference Screen Content Image Quality Assessment
* Multimodal Sensor Data Analysis for Detection of Risk Situations of Fragile People in @home Environments
* Multimodal Tensor-based Late Fusion Approach for Satellite Image Search in Sentinel 2 Images, A
* Musicoder: A Universal Music-acoustic Encoder Based on Transformer
* Noshot Video Browser at VBS2021
* On Fusion of Learned and Designed Features for Video Data Analytics
* Res2-unet: An Enhanced Network for Generalized Nuclear Segmentation in Pathological Images
* Robust Multispectral Pedestrian Detection via Uncertainty-aware Cross-modal Learning
* Search and Explore Strategies for Interactive Analysis of Real-life Image Collections with Unknown and Unique Categories
* Sentiment Similarity-oriented Attention Model with Multi-task Learning for Text-based Emotion Recognition, A
* Shot Boundary Detection Through Multi-stage Deep Convolution Neural Network
* Somhunter V2 at Video Browser Showdown 2021
* Spatial Gradient Guided Learning and Semantic Relation Transfer for Facial Landmark Detection
* Spotifygraph: Visualisation of User's Preferences in Music
* SQL-Like Interpretable Interactive Video Search
* SQL-Like Interpretable Interactive Video Search
* Stacked Sparse Autoencoder for Audio Object Coding
* Structured Feature Learning Model for Clothing Keypoints Localization, A
* System for Interactive Multimedia Retrieval Evaluations, A
* Tell as You Imagine: Sentence Imageability-aware Image Captioning
* Thermal Face Recognition Based on Multi-scale Image Synthesis
* Time-dependent Body Gesture Representation for Video Emotion Recognition
* Towards Explainable Interactive Multi-modal Video Retrieval with Vitrivr
* Towards Optimal Multirate Encoding for HTTP Adaptive Streaming
* Towards the Development of a Trustworthy Chatbot for Mental Health Applications
* Tropical Cyclones Tracking Based on Satellite Cloud Images: Database and Comprehensive Study
* Two-stage Real-time Multi-object Tracking with Candidate Selection
* Unsupervised Gaze: Exploration of Geometric Constraints for 3d Gaze Estimation
* Unsupervised Multi-shot Person Re-identification via Dynamic Bi-directional Normalized Sparse Representation
* Unsupervised Temporal Attention Summarization Model for User Created Videos
* Verge in VBS 2021
* Video Search with Collage Queries
* Video Search with Sub-image Keyword Transfer Using Existing Image Archives
* Videograph: Towards Using Knowledge Graphs for Interactive Video Retrieval
* Visione at Video Browser Showdown 2021
* VR Interface for Browsing Visual Spaces at VBS2021, A
* W2VV++ Bert Model at VBS 2021
* XQM: Interactive Learning on Mobile Phones
110 for MMMod21

MMMod22 * *Advances in Multimedia Modeling
* A-Muze-Net: Music Generation by Composing the Harmony Based on the Generated Melody
* AdaConfigure: Reinforcement Learning-Based Adaptive Configuration for Video Analytics Services
* Adaptive Speech Intelligibility Enhancement for Far-and-Near-end Noise Environments Based on Self-attention StarGAN
* Adversarial Attacks on Deepfake Detectors: A Practical Analysis
* AI for the Media Industry: Application Potential and Automation Levels
* Arbitrary Style Transfer with Adaptive Channel Network
* AS-Net: Class-Aware Assistance and Suppression Network for Few-Shot Learning
* AVSeeker: An Active Video Retrieval Engine at VBS2022
* Bi-attention Modal Separation Network for Multimodal Video Fusion
* Category-Sensitive Incremental Learning for Image-Based 3D Shape Reconstruction
* CDC: Color-Based Diffusion Model with Caption Embedding in VBS 2022
* CDeRSNet: Towards High Performance Object Detection in Vietnamese Document Images
* Classroom Attention Estimation Method Based on Mining Facial Landmarks of Students
* Color the Word: Leveraging Web Images for Machine Translation of Untranslatable Words
* Combining Knowledge and Multi-modal Fusion for Meme Classification
* Complementary Fusion Strategy for RGB-D Face Recognition, A
* Compressive Sensing-Based Image Encryption and Authentication in Edge-Clouds
* Conditional Context-Aware Feature Alignment for Domain Adaptive Detection Transformer
* DataCAP: A Satellite Datacube and Crowdsourced Street-Level Images for the Monitoring of the Common Agricultural Policy
* Depthwise-Separable Residual Capsule for Robust Keyword Spotting
* DIG: A Data-Driven Impact-Based Grouping Method for Video Rebuffering Optimization
* diveXplore 6.0: ITEC's Interactive Video Exploration System at VBS 2022
* Double Granularity Relation Network with Self-criticism for Occluded Person Re-identification
* ECAS-ML: Edge Computing Assisted Adaptation Scheme with Machine Learning for HTTP Adaptive Streaming
* EEG Emotion Recognition Based on Dynamically Organized Graph Neural Network
* Effects and Combination of Tailored Browser-Based and Mobile Cognitive Software Training
* Efficient Search and Browsing of Large-Scale Video Collections with Vibro
* Exploring Implicit and Explicit Relations with the Dual Relation-Aware Network for Image Captioning
* Exquisitor at the Video Browser Showdown 2022
* Fall Detection Using Multimodal Data
* Fast CU Depth Decision Algorithm for AVS3
* Fast Single Image Dehazing Using Morphological Reconstruction and Saturation Compensation
* Generative Landmarks Guided Eyeglasses Removal 3D Face Reconstruction
* GPR1200: A Benchmark for General-Purpose Content-Based Image Retrieval
* Graph Neural Networks Based Multi-granularity Feature Representation Learning for Fine-Grained Visual Categorization
* Human Activity Recognition with IMU and Vital Signs Feature Fusion
* HyText: A Scene-Text Extraction Method for Video Retrieval
* IBC Reference Block Enhancement Model Based on GAN for Screen Content Video Coding, An
* ILMICA - Interactive Learning Model of Image Collage Assessment: A Transfer Learning Approach for Aesthetic Principles
* Indie Games Popularity Prediction by Considering Multimodal Features
* Investigation into Keystroke Dynamics and Heart Rate Variability as Indicators of Stress, An
* Iterative Correction Phase of Light Field for Novel View Reconstruction, An
* IVIST: Interactive Video Search Tool in VBS 2022
* Joint Re-Detection and Re-Identification for Multi-Object Tracking
* JVCSR: Video Compressive Sensing Reconstruction with Joint In-Loop Reference Enhancement and Out-Loop Super-Resolution
* Learning Image Representation via Attribute-Aware Attention Networks for Fashion Classification
* Learning to Classify Weather Conditions from Single Images Without Labels
* Leveraging Selective Prediction for Reliable Image Geolocation
* Lightweight Wavelet-Based Network for JPEG Artifacts Removal
* LLQA: Lifelog Question Answering Dataset
* Long-Range Feature Dependencies Capturing for Low-Resolution Image Classification
* Making Few-Shot Object Detection Simpler and Less Frustrating
* Melody Generation from Lyrics Using Three Branch Conditional LSTM-GAN
* MEViT: Motion Enhanced Video Transformer for Video Classification
* MF-GAN: Multi-conditional Fusion Generative Adversarial Network for Text-to-Image Synthesis
* MGMP: Multimodal Graph Message Propagation Network for Event Detection
* Mining Minority-Class Examples with Uncertainty Estimates
* MoViDNN: A Mobile Platform for Evaluating Video Quality Enhancement with Deep Neural Networks
* Multi-Modal Fusion Network for Rumor Detection with Texts and Images
* Multi-modal Interactive Video Retrieval with Temporal Queries
* Multi-modal Semantic Inconsistency Detection in Social Media News Posts
* Multi-Modal Video Retrieval in Virtual Reality with VITRIVR-VR
* Multi-object Tracking with a Hierarchical Single-Branch Network
* Multi-scale Cross-Modal Transformer Network for RGB-D Object Detection
* Multimodal Embedding for Lifelog Retrieval
* Multimodal Unsupervised Image-to-Image Translation Without Independent Style Encoder
* Multiple Positives Enhanced NCE Loss for Image-Text Retrieval, A
* Non-Uniform Attention Network for Multi-modal Sentiment Analysis
* Novel Chinese Sarcasm Detection Model Based on Retrospective Reader, A
* On Assisting Diagnoses of Pareidolia by Emulating Patient Behavior
* One-Stage Image Inpainting with Hybrid Attention
* Parallel DBSCAN-Martingale Estimation of the Number of Concepts for Automatic Satellite Image Clustering
* Patching Your Clothes: Semantic-Aware Learning for Cloth-Changed Person Re-Identification
* Personalized Fashion Recommendation Using Pairwise Attention
* PF-VTON: Toward High-Quality Parser-Free Virtual Try-On Network
* PicArrange: Visually Sort, Search, and Explore Private Images on a Mac Computer
* Point Cloud Upsampling via a Coarse-to-Fine Network
* Pose-Enhanced Relation Feature for Action Recognition in Still Images
* Prediction of Blood Glucose Using Contextual LifeLog Data
* Progressive GAN-Based Transfer Network for Low-Light Image Enhancement
* Prostate Segmentation of Ultrasound Images Based on Interpretable-Guided Mathematical Model
* Rating-Aware Self-Organizing Maps
* Real-time Detection of Tiny Objects Based on a Weighted Bi-directional FPN
* Real-Time FPGA Design for OMP Targeting 8K Image Reconstruction
* Reconstructing 3D Contour Models of General Scenes from RGB-D Sequences
* Reinforcement Learning-Based Interactive Video Search
* Rethinking Shared Features and Re-ranking for Cross-Modality Person Re-identification
* SAM: Self Attention Mechanism for Scene Text Recognition Based on Swin Transformer
* Shared Latent Space of Font Shapes and Their Noisy Impressions
* Skeletonization Based on K-Nearest-Neighbors on Binary Image
* Spatiotemporal Perturbation Based Dynamic Consistency for Semi-Supervised Temporal Action Detection
* Speech Intelligibility Enhancement By Non-Parallel Speech Style Conversion Using CWT and iMetricGAN Based CycleGAN
* SUnet++: Joint Demosaicing and Denoising of Extreme Low-Light Raw Image
* Task Category Space for User-Centric Comparative Multimedia Search Evaluations, A
* Time-Frequency Attention for Speech Emotion Recognition with Squeeze-and-Excitation Blocks
* Toward Detail-Oriented Image-Based Virtual Try-On with Arbitrary Poses
* UIT at VBS 2022: An Unified and Interactive Video Retrieval System with Temporal Search
* Unsupervised Multi-scale Generative Adversarial Network for Remote Sensing Image Pan-Sharpening, An
* Using Explainable AI to Identify Differences Between Clinical and Experimental Pain Detection Models Based on Facial Expressions
* V-FIRST: A Flexible Interactive Retrieval System for Video at VBS 2022
* VERGE in VBS 2022
* Video Search with Context-Aware Ranker and Relevance Feedback
* Videofall: A Hierarchical Search Engine for VBS2022
* ViRMA: Virtual Reality Multimedia Analytics at Video Browser Showdown 2022
* Virtual Reality Reminiscence Interface for Personal Lifelogs, A
* VISIONE at Video Browser Showdown 2022
* XQM: Search-Oriented vs. Classifier-Oriented Relevance Feedback on Mobile Phones
108 for MMMod22

MMMod23 * *Advances in Multimedia Modeling
* Arctic HARE: A Machine Learning-Based System for Performance Analysis of Cross-Country Skiers
* Audio-visual Sensor Fusion Framework Using Person Attributes Robust to Missing Visual Modality for Person Recognition
* Autorf: Auto Learning Receptive Fields with Spatial Pooling
* Benet: Boundary Enhance Network for Salient Object Detection
* Binary Neural Network for Video Action Recognition
* C-GZS: Controllable Person Image Synthesis Based on Group-Supervised Zero-Shot Learning
* Capturing Nutrition Data for Sports: Challenges and Ethical Issues
* Cascading CNNs with S-DQN: A Parameter-Parsimonious Strategy for 3D Hand Pose Estimation
* CCF-Net: A Cascade Center-based Framework Towards Efficient Human Parts Detection
* CMFG: Cross-model Fine-grained Feature Interaction for Text-video Retrieval
* COMIM-GAN: Improved Text-to-Image Generation via Condition Optimization and Mutual Information Maximization
* Comparison of Deep Learning Techniques for Video-based Automatic Recognition of Greek Folk Dances
* Context-guided Multi-view Stereo with Depth Back-projection
* Cross-modal Attention Model for Fine-Grained Incident Retrieval from Dashcam Videos, A
* CTDA: Contrastive Temporal Domain Adaptation for Action Segmentation
* DARTS-PAP: Differentiable Neural Architecture Search by Polarization of Instance Complexity Weighted Architecture Parameters
* Deep3dsketch+: Rapid 3d Modeling from Single Free-hand Sketches
* DHP: A Joint Video Download and Dynamic Bitrate Adaptation Algorithm for Short Video Streaming
* DiffMotion: Speech-Driven Gesture Synthesis Using Denoising Diffusion Model
* DilatedSegNet: A Deep Dilated Segmentation Network for Polyp Segmentation
* diveXplore at the Video Browser Showdown 2023
* Dual-Feature Aggregation Network for No-Reference Image Quality Assessment
* Dynamic Feature Selection for Structural Image Content Recognition
* Dynamic-static Cross Attentional Feature Fusion Method for Speech Emotion Recognition
* Edge Assisted Asymmetric Convolution Network for MR Image Super-resolution
* Energy Transfer Contrast Network for Unsupervised Domain Adaption
* EvIs-Kitchen: Egocentric Human Activities Recognition with Video and Inertial Sensor Data
* Exploring Effective Interactive Text-Based Video Search in vitrivr
* Fast Accurate Fish Recognition with Deep Learning Based on a Domain-Specific Large-Scale Fish Dataset
* Feature Enhancement and Reconstruction for Small Object Detection
* FL-Former: Flood Level Estimation with Vision Transformer for Images from Cameras in Urban Areas
* Floor Plan Analysis and Vectorization with Multimodal Information
* Free-Form Multi-Modal Multimedia Retrieval (4MR)
* Fusion of Multiple Classifiers Using Self Supervised Learning for Satellite Image Change Detection
* Fusion-Based Low-Light Image Enhancement
* Generating New Paintings by Semantic Guidance
* Generation of Synthetic Tabular Healthcare Data Using Generative Adversarial Networks
* GIGO, Garbage In, Garbage Out: An Urban Garbage Classification Dataset
* Graph-based Data Association in Multiple Object Tracking: A Survey
* Health-Oriented Multimodal Food Question Answering
* Hss: A Hierarchical Semantic Similarity Hard Negative Sampling Method for Dense Retrievers
* Importance of Image Interpretation: Patterns of Semantic Misclassification in Real-world Adversarial Images, The
* Improving Parent-Child Co-Play in a Roblox Game
* Improving the Robustness to Variations of Objects and Instructions with a Neuro-symbolic Approach for Interactive Instruction Following
* In-air Handwritten Chinese Text Recognition with Attention Convolutional Recurrent Network
* Interpretable Driver Fatigue Estimation Based on Hierarchical Symptom Representations
* LAE-Net: Light and Efficient Network for Compressed Video Action Recognition
* Length-sensitive Language-bound Recognition Network for Multilingual Text Recognition, A
* Less Is More: Similarity Models for Content-based Video Retrieval
* Lightweight Image Hashing Based on Knowledge Distillation and Optimal Transport for Face Retrieval
* Lightweight Multi-level Information Fusion Network for Facial Expression Recognition
* Link-Rot in Web-Sourced Multimedia Datasets
* LiteHandNet: A Lightweight Hand Pose Estimation Network via Structural Feature Enhancement
* Low-light Image Enhancement Based on U-net and Haar Wavelet Pooling
* Low-light Image Enhancement Under Non-uniform Dark
* Manga Text Detection with Manga-specific Data Augmentation and Its Applications on Emotion Analysis
* Marine Video Kit: A New Marine Video Dataset for Content-Based Analysis and Retrieval
* MCANet: Multiscale Cross-Modality Attention Network for Multispectral Pedestrian Detection
* Mcom-live: A Multi-codec Optimization Model at the Edge for Live Streaming
* MM-Locate-News: Multimodal Focus Location Estimation in News
* MMM-GCN: Multi-Level Multi-Modal Graph Convolution Network for Video-Based Person Identification
* Multi-scale and Multi-stage Deraining Network with Fourier Space Loss
* Multi-scale Gaussian Difference Preprocessing and Dual Stream CNN-transformer Hybrid Network for Skin Lesion Segmentation
* Multi-stream Fusion Network for Image Splicing Localization, A
* Multi-view Adaptive Bone Activation from Chest X-ray with Conditional Adversarial Nets
* Multimedia Datasets: Challenges and Future Possibilities
* Multimodal Reconstruct and Align Net for Missing Modality Problem in Sentiment Analysis
* Music Instrument Classification Reprogrammed
* NCKU-VTF Dataset and a Multi-scale Thermal-to-Visible Face Synthesis System, The
* Occlusion Model for Spectral Analysis of Light Field Signal, An
* Optimizing Local Feature Representations of 3D Point Clouds with Anisotropic Edge Modeling
* Overall-Distinctive GCN for Social Relation Recognition on Videos
* Pefnet: Positional Embedding Feature for Polyp Segmentation
* People@Places and ToDY: Two Datasets for Scene Classification in Media Production and Archiving
* Perfect Match in Video Retrieval
* Practical Analyses of How Common Social Media Platforms and Photo Storage Services Handle Uploaded Images
* Proposal-improved Few-shot Embedding Model with Contrastive Learning, A
* Pseudo-label Diversity Exploitation for Few-shot Object Detection
* QIVISE: A Quantum-Inspired Interactive Video Search Engine in VBS2023
* Realtime Sitting Posture Recognition on Embedded Device
* Recombining Vision Transformer Architecture for Fine-grained Visual Categorization
* Reinforcement Learning Enhanced PicHunter for Interactive Search
* Research on Multi-task Semantic Segmentation Based on Attention and Feature Fusion Method
* RLSCNet: A Residual Line-Shaped Convolutional Network for Vanishing Point Detection
* Rumor Detection on Social Media by Using Global-local Relations Encoding Network
* Safe Contrastive Clustering
* ScopeSense: An 8.5-Month Sport, Nutrition, and Lifestyle Lifelogging Dataset
* Self-supervised Multi-object Tracking with Cycle-consistency
* Single Cross-domain Semantic Guidance Network for Multimodal Unsupervised Image Translation
* Soccer Athlete Data Visualization and Analysis with an Interactive Dashboard
* Social Relation Graph Generation on Untrimmed Video
* Space-time Video Super-resolution 3d Transformer
* Spatio-Temporal Identity Verification Method for Person-Action Instance Search in Movies, A
* Spectrum Dependent Depth Layered Model for Optimization Rendering Quality of Light Field, A
* SPEM: Self-adaptive Pooling Enhanced Attention Module for Image Recognition
* Sport and Nutrition Digital Analysis: A Legal Assessment
* SRes-NeRF: Improved Neural Radiance Fields for Realism and Accuracy of Specular Reflections
* STN: Stochastic Triplet Neighboring Approach to Self-supervised Denoising from Limited Noisy Images
* Students Take Charge of Climate Communication
* Study of a Cross-modal Interactive Search Tool Using CLIP and Temporal Fusion, A
* Taylor: Impersonation of AI for Audiovisual Content Documentation and Search
* Textual Concept Expansion with Commonsense Knowledge to Improve Dual-Stream Image-Text Matching
* TG-Dance: TransGAN-Based Intelligent Dance Generation with Music
* Toward More Accurate Heterogeneous Iris Recognition with Transformers and Capsules
* Towards Captioning an Image Collection from a Combined Scene Graph Representation Approach
* Towards Deep Personal Lifestyle Models Using Multimodal N-of-1 Data
* Towards Interactive Facial Image Inpainting by Text or Exemplar Image
* Traceable Asynchronous Workflows in Video Retrieval with vitrivr-VR
* Transferable Adversarial Attack on 3d Object Tracking in Point Cloud
* Transformer-based Cross-modal Recipe Embeddings with Large Batch Training
* Transparent Object Detection with Simulation Heatmap Guidance and Context Spatial Attention
* Unsupervised Encoder-decoder Model for Anomaly Prediction Task
* V-FIRST 2.0: Video Event Retrieval with Flexible Textual-Visual Intermediary for VBS 2023
* Vaisl: Visual-aware Identification of Semantic Locations in Lifelog
* VERGE in VBS 2023
* Vibro: Video Browsing with Semantic and Visual Image Embeddings
* Video Search with CLIP and Interactive Text Query Reformulation
* Video-based Precipitation Intensity Recognition Using Dual-dimension and Dual-scale Spatiotemporal Convolutional Neural Network
* VideoCLIP: An Interactive CLIP-based Video Retrieval System at VBS2023
* Virtual Try-On Considering Temporal Consistency for Videoconferencing
* VISIONE at Video Browser Showdown 2023
* Visual Question Generation Under Multi-granularity Cross-Modal Interaction
* Weakly-Supervised Temporal Action Localization with Regional Similarity Consistency
* Weighted Multi-view Clustering Based on Internal Evaluation
125 for MMMod23

