• Large Language Models: from Transformer to ChatGPT
    An overview of diverse topics related to LLMs.
    slides
    • Transformer, BERT, T5, GPT, Scaling law
    • InstructGPT / ChatGPT = GPT + Instruction-tuning + Alignment-tuning (RLHF: Reinforcement Learning from Human Feedback)
    • Prompt engineering: Chain-of-Thought
    • Parameter-Efficient Fine-tuning: Adapter-tuning, Prefix-tuning, Low-Rank Adaptation (LoRA)
    • Quantization: bfloat16, 8-bit optimizer, 4-bit QLoRA
    • Transformer upgrade: Rotary Position Embedding (RoPE), Attention with Linear Biases (ALiBi), Grouped-Query Attention (GQA)
    • Open pre-training, instruction-tuning, alignment-tuning data, Data augmentation (self-instruct)
    • Evaluation: Massive Multitask Language Understanding (MMLU)
    • Engineering: Data parallelism + Tensor parallelism + Pipeline parallelism = 3D parallelism training
    • Future directions: Multi-modality, Mixture-of-Experts (MoE), Retrieval Augmented Generation (RAG)