Large-Scale18
* *YouTube-8M Large-Scale Video Understanding Workshop
* 2nd YouTube-8M Large-Scale Video Understanding Challenge, The
* Approach for Video Classification with Multi-label on YouTube-8M Dataset
* Building A Size Constrained Predictive Models for Video Classification
* Constrained-Size Tensorflow Models for YouTube-8M Video Understanding Challenge
* Hierarchical Video Frame Sequence Representation with Deep Convolutional Graph Network
* Label Denoising with Large Ensembles of Heterogeneous Neural Networks
* Large-Scale Video Classification with Feature Space Augmentation Coupled with Learned Label Relations and Ensembling
* Learnable Pooling Methods for Video Classification
* Learning Video Features for Multi-label Classification
* NeXtVLAD: An Efficient Neural Network to Aggregate Frame-Level Features for Large-Scale Video Classification
* Non-local NetVLAD Encoding for Video Classification
* Temporal Attention Mechanism with Conditional Inference for Large-Scale Multi-label Video Classification
* Towards Good Practices for Multi-modal Fusion in Large-Scale Video Classification
* Training Compact Deep Learning Models for Video Classification Using Circulant Matrices
15 for Large-Scale18
LargeVM24
* *Efficient Large Vision Models
* Adapting the Segment Anything Model During Usage in Novel Situations
* Adaptive Memory Replay for Continual Learning
* Efficient Transformer Adaptation with Soft Token Merging
* EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
* HaLViT: Half of the Weights are Enough
* Parameter Efficient Fine-tuning of Self-supervised ViTs without Catastrophic Forgetting
* Parameter Efficient Fine-tuning of Self-supervised ViTs without Catastrophic Forgetting
* PMAFusion: Projection-Based Multi-Modal Alignment for 3D Semantic Occupancy Prediction
* QAttn: Efficient GPU Kernels for mixed-precision Vision Transformers
* SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
11 for LargeVM24