sean lingren - llm learning

← Home

my resources for learning how llms work and how to optimize them

writing

jeff dean, sanjay ghemawat - performance hints
chris olah - visual information theory
jochen görtler - a visual exploration of gaussian processes
gregory gundersen - a history of large language models
transformer inference arithmetic
christopher fleetwood - domain specific architectures for ai inference
damek davis - basic facts about gpus
modal gpu glossary - performance
horace he - making deep learning go brrrr from first principles
writing high-performance matrix multiplication kernels for blackwell
simon boehm - how to optimize a cuda matmul kernel for cublas-like performance: a worklog
pranjal shankhdhar - outperforming cublas on h100: a worklog
how to scale your model
the ultra-scale playbook: training llms on gpu clusters
horace he - defeating nondeterminism in llm inference
arkar min aung - turboquant: a first-principles walkthrough
sankalp shubham - how prompt caching works - paged attention and automatic prefix caching plus practical tips
sam rose - prompt caching: 10x cheaper llm tokens, but how?
max mynter - becoming a research engineer at a big llm lab – 18 months of strategic career development

videos

reiner pope - lecture on how llms are trained and served - accompanying flashcards