| Weekly Topic | Papers for Presentation | Resource | 
  
  
    | Deep Learning | [Chen et al. 2020] A Simple Framework for Contrastive Learning of Visual Representations, ICML | [Paper]
      [GitHub] | 
  
    | [He et al. 2020] Momentum Contrast for Unsupervised Visual Representation Learning, CVPR | [Paper]
      [GitHub] | 
  
    | [Grill et al. 2020] Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning, NeurIPS | [Paper]
      [GitHub] | 
  
  
    | Convolutional Neural Networks (CNN) | [Tan et al. 2019] EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, ICML | [Paper]
      [GitHub] | 
  
    | [Liu et al. 2022] A ConvNet for the 2020s, CVPR | [Paper]
      [GitHub] | 
  
    | [Woo et al. 2023] ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders, CVPR | [Paper]
      [GitHub] | 
  
  
    | Transformer | [Dosovitskiy et al. 2021] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, ICLR | [Paper]
      [GitHub] | 
  
    | [Caron et al. 2021] Emerging Properties in Self-Supervised Vision Transformers, ICCV | [Paper]
      [GitHub] | 
  
    | [He et al. 2022] Masked Autoencoders Are Scalable Vision Learners, CVPR | [Paper]
      [GitHub] | 
  
  
    | Overview of Generative AI | [Isola et al. 2017] Image-to-Image Translation with Conditional Adversarial Networks, CVPR | [Paper]
      [GitHub] | 
  
    | [Devlin et al. 2019] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, NAACL | [Paper]
      [GitHub] | 
  
    | [Dhariwal et al. 2021] Diffusion Models Beat GANs on Image Synthesis, NeurIPS | [Paper]
      [GitHub] | 
  
  
    | Generative Models | [Mildenhall et al. 2020] NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis, ECCV | [Paper]
      [GitHub] | 
  
    | [Rombach et al. 2022] High-Resolution Image Synthesis with Latent Diffusion Models, CVPR | [Paper]
      [GitHub] | 
  
    | [Kerbl et al. 2023] 3D Gaussian Splatting for Real-time Radiance Field Rendering, SIGGRAPH | [Paper]
      [GitHub] | 
  
  
    | Large Language Moudel (LLM) | [Chowdhery et al. 2022] PaLM: Scaling Language Modeling with Pathways, JMLR | [Paper]
      [GitHub] | 
  
    | [Wei et al. 2022] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, NeurIPS | [Paper] | 
  
    | [Ouyang et al. 2022] Training Language Models to Follow Instructions with Human Feedback, NeurIPS | [Paper]
      [GitHub] | 
  
  
    | Multimodal Learning | [Radford et al. 2021] Learning Transferable Visual Models from Natural Language Supervision, ICML | [Paper]
      [GitHub] | 
  
    | [Radford et al. 2023] Robust Speech Recognition via Large-scale Weak Supervision, ICML | [Paper]
      [GitHub] | 
  
    | [Girdhar et al. 2023] ImageBind: One Embedding Space to Bind Them All, CVPR | [Paper]
      [GitHub] | 
  
  
    | Large Multimodal Model (LMM) | [Alayrac et al. 2022] Flamingo: A Visual Language Model for Few-Shot Learning, NeurIPS | [Paper]
      [GitHub] | 
  
    | [Liu et al. 2023] Visual Instruction Tuning, NeurIPS | [Paper]
      [GitHub] | 
  
    | [Wu et al. 2024] VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks, NeurIPS | [Paper]
      [GitHub] | 
  
  
    | AI in Physical World | [Driess et al. 2023] PaLM-E: An Embodied Multimodal Language Model, ICML | [Paper]
      [GitHub] | 
  
    | [Kim et al. 2024] OpenVLA: An Open-Source Vision-Language-Action Model, arXiv | [Paper]
      [GitHub] | 
  
    | [Wang et al. 2025] Magma: A Foundation Model for Multimodal AI Agents, CVPR | [Paper]
      [GitHub] | 
  
  
    | Ethics, Fairness, and AI Safety | [Tao et al. 2024] When to Trust LLMs: Aligning Confidence with Response Quality, ACL | [Paper]
      [GitHub] | 
  
    | [Zhao et al. 2024] A Taxonomy of Challenges to Curating Fair Datasets, NeurIPS | [Paper] | 
  
    | [Li et al. 2025] T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation, CVPR | [Paper]
      [GitHub] |