The papers that shaped modern AI, from the original Transformer to the latest breakthroughs. Understanding the science behind the technology.
The landmark papers that established the field. Essential reading for understanding modern AI.
Vaswani et al. (Google) • 2017
120,000+
Introduced the Transformer architecture, eliminating recurrence and convolutions in favor of self-attention. The foundation of all modern LLMs including GPT, Claude, and Gemini.
Devlin et al. (Google) • 2018
90,000+
Pioneered bidirectional pre-training for language understanding. Introduced masked language modeling and established the pre-train/fine-tune paradigm.
Brown et al. (OpenAI) • 2020
35,000+
Demonstrated that scaling language models to 175B parameters enables few-shot learning without fine-tuning. Sparked the current LLM era.
Ouyang et al. (OpenAI) • 2022
8,000+
Introduced RLHF (Reinforcement Learning from Human Feedback) to align language models with human intent. Foundation of ChatGPT's helpfulness.
He et al. (Microsoft) • 2015
180,000+
Introduced residual connections enabling training of very deep networks. Won ImageNet 2015 and revolutionized deep learning architecture design.
Goodfellow et al. • 2014
65,000+
Introduced the adversarial training paradigm with generator and discriminator networks. Pioneered modern generative AI.
The cutting-edge research pushing AI capabilities forward. These papers represent the current frontier.
Tian et al. • 2024
New image generation approach predicting images from coarse to fine resolutions. Outperforms diffusion transformers with LLM-like scaling.
Meta AI • 2024
405B dense Transformer matching GPT-4 capabilities. Implements grouped query attention and 128K context. Open weights accelerated the field.
Gu & Dao • 2023
Proposed alternative to Transformers using selective state spaces. Achieves linear-time complexity while matching Transformer quality.
Dao • 2023
Optimized GPU implementation of attention achieving 2x speedup over FlashAttention. Enables longer sequences and faster training.
Bai et al. (Anthropic) • 2022
Training AI to follow principles without extensive human labeling. Self-critique based on defined constitution reduces harmful outputs.
Lewis et al. (Meta) • 2020
Combined retrieval with generation to ground LLM responses in external knowledge. Reduces hallucinations and enables access to current information.
Ho et al. (Google) • 2020
Established diffusion models as state-of-the-art for image generation. Foundation for DALL-E, Stable Diffusion, and Midjourney.
Darcet et al. (Meta) • 2024
Discovered issues with high-norm tokens in ViT feature maps. Adding register tokens significantly improves performance across vision tasks.
Munkhdalai et al. (Google) • 2024
Method to scale Transformers to infinitely long inputs with limited compute. Combines local and compressive memory.
Vasudeva et al. • 2025
Explains why transformers consistently outperform older approaches. Discovers they naturally learn 'low sensitivity functions' with stable outputs.
The major directions of current AI research and their significance.
Ensuring AI systems behave as intended and remain beneficial
Constitutional AI • InstructGPT • Anthropic's interpretability work
Critical focus as models become more capable
Reducing compute and memory requirements for inference and training
Mamba • FlashAttention • Mixture of Experts • Quantization
Essential for deployment at scale
Integrating text, vision, audio, and other modalities
GPT-4V technical report • Gemini • LLaVA • CLIP
Moving toward unified perception models
Improving logical reasoning and multi-step problem solving
Chain-of-Thought • Tree of Thoughts • ReAct • Self-consistency
Key differentiator for advanced applications
Understanding how models make decisions internally
Anthropic's mechanistic interpretability • Activation patching • Feature visualization
Anthropic committed to solving by 2027
FullAI implements the latest research breakthroughs. Experience state-of-the-art AI in your applications.
Start Building for Free