Weekly Topic |
Papers for Presentation |
Resource |
Deep Learning |
[Chen et al. 2020] A Simple Framework for Contrastive Learning of Visual Representations, ICML |
[Paper]
[GitHub]
|
[He et al. 2020] Momentum Contrast for Unsupervised Visual Representation Learning, CVPR |
[Paper]
[GitHub]
|
[Grill et al. 2020] Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning, NeurIPS |
[Paper]
[GitHub]
|
Convolutional Neural Networks (CNN) |
[Tan et al. 2019] EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, ICML |
[Paper]
[GitHub]
|
[Liu et al. 2022] A ConvNet for the 2020s, CVPR |
[Paper]
[GitHub]
|
[Woo et al. 2023] ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders, CVPR |
[Paper]
[GitHub]
|
Transformer |
[Dosovitskiy et al. 2021] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, ICLR |
[Paper]
[GitHub]
|
[Caron et al. 2021] Emerging Properties in Self-Supervised Vision Transformers, ICCV |
[Paper]
[GitHub]
|
[He et al. 2022] Masked Autoencoders Are Scalable Vision Learners, CVPR |
[Paper]
[GitHub]
|
Overview of Generative AI |
[Isola et al. 2017] Image-to-Image Translation with Conditional Adversarial Networks, CVPR |
[Paper]
[GitHub]
|
[Devlin et al. 2019] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, NAACL |
[Paper]
[GitHub]
|
[Dhariwal et al. 2021] Diffusion Models Beat GANs on Image Synthesis, NeurIPS |
[Paper]
[GitHub]
|
Generative Models |
[Mildenhall et al. 2020] NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis, ECCV |
[Paper]
[GitHub]
|
[Rombach et al. 2022] High-Resolution Image Synthesis with Latent Diffusion Models, CVPR |
[Paper]
[GitHub]
|
[Kerbl et al. 2023] 3D Gaussian Splatting for Real-time Radiance Field Rendering, SIGGRAPH |
[Paper]
[GitHub]
|
Large Language Moudel (LLM) |
[Chowdhery et al. 2022] PaLM: Scaling Language Modeling with Pathways, JMLR |
[Paper]
[GitHub]
|
[Wei et al. 2022] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, NeurIPS |
[Paper]
|
[Ouyang et al. 2022] Training Language Models to Follow Instructions with Human Feedback, NeurIPS |
[Paper]
[GitHub]
|
Multimodal Learning |
[Radford et al. 2021] Learning Transferable Visual Models from Natural Language Supervision, ICML |
[Paper]
[GitHub]
|
[Radford et al. 2023] Robust Speech Recognition via Large-scale Weak Supervision, ICML |
[Paper]
[GitHub]
|
[Girdhar et al. 2023] ImageBind: One Embedding Space to Bind Them All, CVPR |
[Paper]
[GitHub]
|
Large Multimodal Model (LMM) |
[Alayrac et al. 2022] Flamingo: A Visual Language Model for Few-Shot Learning, NeurIPS |
[Paper]
[GitHub]
|
[Liu et al. 2023] Visual Instruction Tuning, NeurIPS |
[Paper]
[GitHub]
|
[Wu et al. 2024] VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks, NeurIPS |
[Paper]
[GitHub]
|
AI in Physical World |
[Driess et al. 2023] PaLM-E: An Embodied Multimodal Language Model, ICML |
[Paper]
[GitHub]
|
[Kim et al. 2024] OpenVLA: An Open-Source Vision-Language-Action Model, arXiv |
[Paper]
[GitHub]
|
[Wang et al. 2025] Magma: A Foundation Model for Multimodal AI Agents, CVPR |
[Paper]
[GitHub]
|
Ethics, Fairness, and AI Safety |
[Tao et al. 2024] When to Trust LLMs: Aligning Confidence with Response Quality, ACL |
[Paper]
[GitHub]
|
[Zhao et al. 2024] A Taxonomy of Challenges to Curating Fair Datasets, NeurIPS |
[Paper]
|
[Li et al. 2025] T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation, CVPR |
[Paper]
[GitHub]
|