Learning Notes
📖 Deep-Dive Study Notes
Deep-dive study notes from my journey learning AI/ML — from LLM fundamentals to reasoning models. Written in Chinese, these notes break down complex concepts into digestible chapters.
📚 LLM学习笔记
Building LLMs from Scratch
Study notes from Build a Large Language Model From Scratch by Sebastian Raschka. From tokenization through attention, pretraining, fine-tuning, and LoRA — a hands-on journey building GPT from the ground up.
7 Chapters
Chinese
Based on Sebastian Raschka
Ch2
Tokenization与文本处理 — LLM的第一步
From regex tokenization to BPE encoding, sliding window data loading, and generating embedding vectors for LLM input.
✅ Read →
Ch3
注意力机制 — LLM的核心引擎
From dot-product similarity through four iterations to Multi-Head Causal Attention used in GPT.
✅ Read →
Ch4
从零实现GPT模型架构
Assembling LayerNorm, GELU, FeedForward, and residual connections into a complete GPT decoder-only model, block by block.
✅ Read →
Ch5
预训练 — 让模型学会语言
Pretraining via next-token prediction, text generation strategies (temperature/top-k), and learning rate scheduling.
✅ Read →
Ch6
文本分类微调实战
Fine-tuning GPT for spam classification with head replacement and layer freezing, achieving 95.67% test accuracy.
✅ Read →
Ch7
指令微调 — 从基座到助手
Instruction fine-tuning using the Alpaca template and custom collate functions to turn a base model into an assistant.
✅ Read →
A.E
LoRA — 高效微调的秘密武器
LoRA low-rank adaptation — training only 2.6M params (2.1% of GPT-2) via small bypass matrices, matching full fine-tuning.
✅ Read →
🧠 推理模型笔记
Building Reasoning Models from Scratch
Study notes from Build a Reasoning Model (From Scratch) by Sebastian Raschka. From Qwen3 text generation through evaluation, inference-time scaling, GRPO reinforcement learning, and reasoning distillation.
8 Chapters
Chinese
Based on Sebastian Raschka
Ch2
使用Qwen3生成文本
Loading pretrained Qwen3 0.6B, tokenizer encode/decode, token-by-token generation, accelerated with KV Cache and torch.compile.
✅ Read →
Ch3
如何评估推理模型
Building a math verification pipeline — answer extraction, LaTeX normalization, SymPy equivalence checking, and MATH-500 benchmark.
✅ Read →
Ch4
Inference-Time Scaling — 推理时的智慧
Three inference-time techniques boosting MATH-500 accuracy from 15% to 52% with Temperature Scaling, Top-p, and Self-Consistency.
✅ Read →
Ch5
自我优化 — 让模型越想越好
Self-Refinement loop (generate → critique → revise) with heuristic scoring and log-probability confidence scoring.
✅ Read →
Ch6
GRPO强化学习训练推理能力
Full RLVR training loop with GRPO — rollout sampling, reward, advantage, policy gradient loss. 50 steps: 15.2% → 47.4%.
✅ Read →
Ch7
改进GRPO — 更稳定的训练
Stabilizing GRPO with clipped policy ratio, KL regularization, format rewards, and 15+ frontier techniques (DAPO, Dr. GRPO).
✅ Read →
Ch8
推理蒸馏 — 小模型也能深度思考
Reasoning distillation via SFT — data generation pipeline and the distillation vs RL tradeoff based on DeepSeek-R1.
✅ Read →
A.C
Qwen3源码深度解读
Layer-by-layer Qwen3 source walkthrough — RMSNorm, SwiGLU, RoPE, GQA, TransformerBlock, KVCache, and Tokenizer.
✅ Read →
📄 Research Paper Notes
Paper Analysis & Commentary
Personal analysis and commentary on notable AI/ML research papers. Breaking down key ideas, evaluating experimental results, and exploring practical implications for the agent ecosystem.
1 Paper
English
Research Analysis
P.1
SKILL0: From Reading Instructions to Actually Learning
How In-Context RL internalizes agent skills into model parameters, achieving 87.9% success with 5.8x fewer tokens and zero skill retrieval at inference.
✅ Read →