vllm-lab
Measured vLLM's core engine mechanisms one at a time on a 6 GB GPU — continuous batching (62×), prefix caching, PagedAttention preemption, fp8 KV, and more. Eight demos, each with a live dashboard.
vLLMCUDAPython
llm-from-scratch
A GPT-2 class language model built from first principles in 8 stages — tokenizer to attention to pretraining to sampling. pytest-covered, with guided CodeTours teaching every line.
PythonPyTorchTransformers
VibeThinker-3B-W4A16
Quantized a 3B reasoning model from 5.8 GB to 2.0 GB (W4A16 / GPTQ) so it fits and serves on a 6 GB GPU at ~67 tok/s in vLLM. Published to the Hugging Face Hub.
llmcompressorGPTQvLLM