Skip to main content


Paper Reading List
#


Weekly Topic Papers for Presentation Resource
Deep Learning [Chen et al. 2020] A Simple Framework for Contrastive Learning of Visual Representations, ICML [Paper] [GitHub]
[He et al. 2020] Momentum Contrast for Unsupervised Visual Representation Learning, CVPR [Paper] [GitHub]
[Grill et al. 2020] Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning, NeurIPS [Paper] [GitHub]
Convolutional Neural Networks (CNN) [Tan et al. 2019] EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, ICML [Paper] [GitHub]
[Liu et al. 2022] A ConvNet for the 2020s, CVPR [Paper] [GitHub]
[Woo et al. 2023] ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders, CVPR [Paper] [GitHub]
Transformer [Dosovitskiy et al. 2021] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, ICLR [Paper] [GitHub]
[Caron et al. 2021] Emerging Properties in Self-Supervised Vision Transformers, ICCV [Paper] [GitHub]
[He et al. 2022] Masked Autoencoders Are Scalable Vision Learners, CVPR [Paper] [GitHub]
Overview of Generative AI [Isola et al. 2017] Image-to-Image Translation with Conditional Adversarial Networks, CVPR [Paper] [GitHub]
[Devlin et al. 2019] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, NAACL [Paper] [GitHub]
[Dhariwal et al. 2021] Diffusion Models Beat GANs on Image Synthesis, NeurIPS [Paper] [GitHub]
Generative Models [Mildenhall et al. 2020] NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis, ECCV [Paper] [GitHub]
[Rombach et al. 2022] High-Resolution Image Synthesis with Latent Diffusion Models, CVPR [Paper] [GitHub]
[Kerbl et al. 2023] 3D Gaussian Splatting for Real-time Radiance Field Rendering, SIGGRAPH [Paper] [GitHub]
Large Language Moudel (LLM) [Chowdhery et al. 2022] PaLM: Scaling Language Modeling with Pathways, JMLR [Paper] [GitHub]
[Wei et al. 2022] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, NeurIPS [Paper]
[Ouyang et al. 2022] Training Language Models to Follow Instructions with Human Feedback, NeurIPS [Paper] [GitHub]
Multimodal Learning [Radford et al. 2021] Learning Transferable Visual Models from Natural Language Supervision, ICML [Paper] [GitHub]
[Radford et al. 2023] Robust Speech Recognition via Large-scale Weak Supervision, ICML [Paper] [GitHub]
[Girdhar et al. 2023] ImageBind: One Embedding Space to Bind Them All, CVPR [Paper] [GitHub]
Large Multimodal Model (LMM) [Alayrac et al. 2022] Flamingo: A Visual Language Model for Few-Shot Learning, NeurIPS [Paper] [GitHub]
[Liu et al. 2023] Visual Instruction Tuning, NeurIPS [Paper] [GitHub]
[Wu et al. 2024] VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks, NeurIPS [Paper] [GitHub]
AI in Physical World [Driess et al. 2023] PaLM-E: An Embodied Multimodal Language Model, ICML [Paper] [GitHub]
[Kim et al. 2024] OpenVLA: An Open-Source Vision-Language-Action Model, arXiv [Paper] [GitHub]
[Wang et al. 2025] Magma: A Foundation Model for Multimodal AI Agents, CVPR [Paper] [GitHub]
Ethics, Fairness, and AI Safety [Tao et al. 2024] When to Trust LLMs: Aligning Confidence with Response Quality, ACL [Paper] [GitHub]
[Zhao et al. 2024] A Taxonomy of Challenges to Curating Fair Datasets, NeurIPS [Paper]
[Li et al. 2025] T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation, CVPR [Paper] [GitHub]