---
title: llm learning
description: my resources for learning how llms work and how to optimize them
template: page
---

my resources for learning how llms work and how to optimize them

## writing

- [jeff dean, sanjay ghemawat - performance hints](https://abseil.io/fast/hints.html)
- [chris olah - visual information theory](https://colah.github.io/posts/2015-09-Visual-Information/)
- [jochen görtler - a visual exploration of gaussian processes](https://distill.pub/2019/visual-exploration-gaussian-processes/)
- [gregory gundersen - a history of large language models](https://gregorygundersen.com/blog/2025/10/01/large-language-models/)
- [transformer inference arithmetic](https://kipp.ly/p/transformer-inference-arithmetic)
- [christopher fleetwood - domain specific architectures for ai inference](https://fleetwood.dev/posts/domain-specific-architectures)
- [damek davis - basic facts about gpus](https://damek.github.io/random/basic-facts-about-gpus/)
- [modal gpu glossary - performance](https://modal.com/gpu-glossary/perf)
- [horace he - making deep learning go brrrr from first principles](https://horace.io/brrr_intro.html)
- [writing high-performance matrix multiplication kernels for blackwell](https://docs.jax.dev/en/latest/pallas/gpu/blackwell_matmul.html)
- [simon boehm - how to optimize a cuda matmul kernel for cublas-like performance: a worklog](https://siboehm.com/articles/22/CUDA-MMM)
- [pranjal shankhdhar - outperforming cublas on h100: a worklog](https://cudaforfun.substack.com/p/outperforming-cublas-on-h100-a-worklog)
- [how to scale your model](https://jax-ml.github.io/scaling-book/)
- [the ultra-scale playbook: training llms on gpu clusters](https://huggingface.co/spaces/nanotron/ultrascale-playbook)
- [horace he - defeating nondeterminism in llm inference](https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/)
- [arkar min aung - turboquant: a first-principles walkthrough](https://arkaung.github.io/interactive-turboquant/)
- [sankalp shubham - how prompt caching works - paged attention and automatic prefix caching plus practical tips](https://sankalp.bearblog.dev/how-prompt-caching-works/)
- [sam rose - prompt caching: 10x cheaper llm tokens, but how?](https://ngrok.com/blog/prompt-caching)
- [max mynter - becoming a research engineer at a big llm lab -- 18 months of strategic career development](https://www.maxmynter.com/pages/blog/jobhunt)

## videos

- [reiner pope - lecture on how llms are trained and served](https://www.youtube.com/watch?v=xmkSf5IS-zw) - [accompanying flashcards](https://reiner-flashcards.vercel.app/)
