Djolonga, J.[Josip]
Co Author Listing * End-to-End Spatio-Temporal Action Localisation with Video Transformers
* Higher-Order Inference for Multi-class Log-Supermodular Models
* On Robustness and Transferability of Convolutional Neural Networks
* On Scaling Up a Multilingual Vision and Language Model
* Representation learning from videos in-the-wild: An object-centric approach
* Self-Supervised Learning of Video-Induced Visual Invariances