Learning Notes - Dr. Melanie Li

Learning Notes

📖 Deep-Dive Study Notes

Deep-dive study notes from my journey learning AI/ML — from LLM fundamentals to reasoning models. Written in Chinese, these notes break down complex concepts into digestible chapters.

📚 LLM学习笔记

Building LLMs from Scratch

Study notes from Build a Large Language Model From Scratch by Sebastian Raschka. From tokenization through attention, pretraining, fine-tuning, and LoRA — a hands-on journey building GPT from the ground up.

7 Chapters

Chinese

Based on Sebastian Raschka

Ch2

Tokenization与文本处理 — LLM的第一步

From regex tokenization to BPE encoding, sliding window data loading, and generating embedding vectors for LLM input.

Ch3

注意力机制 — LLM的核心引擎

From dot-product similarity through four iterations to Multi-Head Causal Attention used in GPT.

Ch4

从零实现GPT模型架构

Assembling LayerNorm, GELU, FeedForward, and residual connections into a complete GPT decoder-only model, block by block.

Ch5

预训练 — 让模型学会语言

Pretraining via next-token prediction, text generation strategies (temperature/top-k), and learning rate scheduling.

Ch6

文本分类微调实战

Fine-tuning GPT for spam classification with head replacement and layer freezing, achieving 95.67% test accuracy.

Ch7

指令微调 — 从基座到助手

Instruction fine-tuning using the Alpaca template and custom collate functions to turn a base model into an assistant.

A.E

LoRA — 高效微调的秘密武器

LoRA low-rank adaptation — training only 2.6M params (2.1% of GPT-2) via small bypass matrices, matching full fine-tuning.

🧠 推理模型笔记

Building Reasoning Models from Scratch

Study notes from Build a Reasoning Model (From Scratch) by Sebastian Raschka. From Qwen3 text generation through evaluation, inference-time scaling, GRPO reinforcement learning, and reasoning distillation.

8 Chapters

Chinese

Based on Sebastian Raschka

Ch2

使用Qwen3生成文本

Loading pretrained Qwen3 0.6B, tokenizer encode/decode, token-by-token generation, accelerated with KV Cache and torch.compile.

Ch3

如何评估推理模型

Building a math verification pipeline — answer extraction, LaTeX normalization, SymPy equivalence checking, and MATH-500 benchmark.

Ch4

Inference-Time Scaling — 推理时的智慧

Three inference-time techniques boosting MATH-500 accuracy from 15% to 52% with Temperature Scaling, Top-p, and Self-Consistency.

Ch5

自我优化 — 让模型越想越好

Self-Refinement loop (generate → critique → revise) with heuristic scoring and log-probability confidence scoring.

Ch6

GRPO强化学习训练推理能力

Full RLVR training loop with GRPO — rollout sampling, reward, advantage, policy gradient loss. 50 steps: 15.2% → 47.4%.

Ch7

改进GRPO — 更稳定的训练

Stabilizing GRPO with clipped policy ratio, KL regularization, format rewards, and 15+ frontier techniques (DAPO, Dr. GRPO).

Ch8

推理蒸馏 — 小模型也能深度思考

Reasoning distillation via SFT — data generation pipeline and the distillation vs RL tradeoff based on DeepSeek-R1.

A.C

Qwen3源码深度解读

Layer-by-layer Qwen3 source walkthrough — RMSNorm, SwiGLU, RoPE, GQA, TransformerBlock, KVCache, and Tokenizer.

📄 Research Paper Notes

Paper Analysis & Commentary

Personal analysis and commentary on notable AI/ML research papers. Breaking down key ideas, evaluating experimental results, and exploring practical implications for the agent ecosystem.

1 Paper

English

Research Analysis

P.1

SKILL0: From Reading Instructions to Actually Learning

How In-Context RL internalizes agent skills into model parameters, achieving 87.9% success with 5.8x fewer tokens and zero skill retrieval at inference.