| Weekly Topic |
Papers for Presentation |
Resource |
| Deep Learning |
[Chen et al. 2020] A Simple Framework for Contrastive Learning of Visual Representations, ICML |
[Paper]
[GitHub]
|
| [He et al. 2020] Momentum Contrast for Unsupervised Visual Representation Learning, CVPR |
[Paper]
[GitHub]
|
| [Grill et al. 2020] Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning, NeurIPS |
[Paper]
[GitHub]
|
| Convolutional Neural Networks (CNN) |
[Tan et al. 2019] EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, ICML |
[Paper]
[GitHub]
|
| [Liu et al. 2022] A ConvNet for the 2020s, CVPR |
[Paper]
[GitHub]
|
| [Woo et al. 2023] ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders, CVPR |
[Paper]
[GitHub]
|
| Transformer |
[Dosovitskiy et al. 2021] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, ICLR |
[Paper]
[GitHub]
|
| [Caron et al. 2021] Emerging Properties in Self-Supervised Vision Transformers, ICCV |
[Paper]
[GitHub]
|
| [He et al. 2022] Masked Autoencoders Are Scalable Vision Learners, CVPR |
[Paper]
[GitHub]
|
| Overview of Generative AI |
[Isola et al. 2017] Image-to-Image Translation with Conditional Adversarial Networks, CVPR |
[Paper]
[GitHub]
|
| [Devlin et al. 2019] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, NAACL |
[Paper]
[GitHub]
|
| [Dhariwal et al. 2021] Diffusion Models Beat GANs on Image Synthesis, NeurIPS |
[Paper]
[GitHub]
|
| Generative Models |
[Mildenhall et al. 2020] NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis, ECCV |
[Paper]
[GitHub]
|
| [Rombach et al. 2022] High-Resolution Image Synthesis with Latent Diffusion Models, CVPR |
[Paper]
[GitHub]
|
| [Kerbl et al. 2023] 3D Gaussian Splatting for Real-time Radiance Field Rendering, SIGGRAPH |
[Paper]
[GitHub]
|
| Large Language Moudel (LLM) |
[Chowdhery et al. 2022] PaLM: Scaling Language Modeling with Pathways, JMLR |
[Paper]
[GitHub]
|
| [Wei et al. 2022] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, NeurIPS |
[Paper]
|
| [Ouyang et al. 2022] Training Language Models to Follow Instructions with Human Feedback, NeurIPS |
[Paper]
[GitHub]
|
| Multimodal Learning |
[Radford et al. 2021] Learning Transferable Visual Models from Natural Language Supervision, ICML |
[Paper]
[GitHub]
|
| [Radford et al. 2023] Robust Speech Recognition via Large-scale Weak Supervision, ICML |
[Paper]
[GitHub]
|
| [Girdhar et al. 2023] ImageBind: One Embedding Space to Bind Them All, CVPR |
[Paper]
[GitHub]
|
| Large Multimodal Model (LMM) |
[Alayrac et al. 2022] Flamingo: A Visual Language Model for Few-Shot Learning, NeurIPS |
[Paper]
[GitHub]
|
| [Liu et al. 2023] Visual Instruction Tuning, NeurIPS |
[Paper]
[GitHub]
|
| [Wu et al. 2024] VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks, NeurIPS |
[Paper]
[GitHub]
|
| AI in Physical World |
[Driess et al. 2023] PaLM-E: An Embodied Multimodal Language Model, ICML |
[Paper]
[GitHub]
|
| [Kim et al. 2024] OpenVLA: An Open-Source Vision-Language-Action Model, arXiv |
[Paper]
[GitHub]
|
| [Wang et al. 2025] Magma: A Foundation Model for Multimodal AI Agents, CVPR |
[Paper]
[GitHub]
|
| Ethics, Fairness, and AI Safety |
[Tao et al. 2024] When to Trust LLMs: Aligning Confidence with Response Quality, ACL |
[Paper]
[GitHub]
|
| [Zhao et al. 2024] A Taxonomy of Challenges to Curating Fair Datasets, NeurIPS |
[Paper]
|
| [Li et al. 2025] T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation, CVPR |
[Paper]
[GitHub]
|