- Large Language Models: from Transformer to ChatGPT
An overview of diverse topics related to LLMs.
slides
- Transformer, BERT, T5, GPT, Scaling law
- InstructGPT / ChatGPT = GPT + Instruction-tuning + Alignment-tuning (RLHF: Reinforcement Learning from Human Feedback)
- Prompt engineering: Chain-of-Thought
- Parameter-Efficient Fine-tuning: Adapter-tuning, Prefix-tuning, Low-Rank Adaptation (LoRA)
- Quantization: bfloat16, 8-bit optimizer, 4-bit QLoRA
- Transformer upgrade: Rotary Position Embedding (RoPE), Attention with Linear Biases (ALiBi), Grouped-Query Attention (GQA)
- Open pre-training, instruction-tuning, alignment-tuning data, Data augmentation (self-instruct)
- Evaluation: Massive Multitask Language Understanding (MMLU)
- Engineering: Data parallelism + Tensor parallelism + Pipeline parallelism = 3D parallelism training
- Future directions: Multi-modality, Mixture-of-Experts (MoE), Retrieval Augmented Generation (RAG)