DeepLearning Chapter 5 | 冰芯糖果屋

type

Post

date

Dec 17, 2025

slug

llm/learning/3Blue1Brown/chapter5

summary

3Blue1Brown LLM 相关视频的笔记

status

Published

tags

LLM

AI

category

技术茶点

icon

password

✨

3Blue1Brown LLM 相关视频的笔记

3Blue1Brown YouTube 主页
3Blue1Brown
My name is Grant Sanderson. Videos here cover a variety of topics in math, or adjacent fields like physics and CS, all with an emphasis on visualizing the core ideas. The goal is to use animation to help elucidate and motivate otherwise tricky topics, and for difficult problems to be made simple with changes in perspective. For more information, other projects, FAQs, and inquiries see the website: https://www.3blue1brown.com
https://www.youtube.com/@3blue1brown/videos

章节	内容核心
Deep Learning Chapter 1	什么是神经网络
Deep Learning Chapter 2	梯度下降、神经网络如何学习
Deep Learning Chapter 3	反向传播直观理解
Deep Learning Chapter 4	反向传播微积分细节
Deep Learning Chapter 5	Transformer 与 LLM 介绍
Deep Learning Chapter 6	注意力机制详解
Deep Learning Chapter 7	变压器中如何存储知识/事实
Additional	Diffusion Models、图像生成等解释

Transformers, the tech behind LLMs | Deep Learning Chapter 5

✨

Transformer 是一种专门为序列建模设计的神经网络架构，它抛弃了 RNN 的递归计算方式，完全基于向量空间中的线性变换与注意力机制进行信息建模。

在 Transformer 中：

输入序列首先通过 Embedding 映射到高维向量空间，使词语、符号等离散概念转化为可计算的几何对象；

通过 Self-Attention 机制，模型能够在同一时间步内建模序列中任意位置之间的关系，而不依赖顺序传播；

Attention 通过 Query–Key–Value 的相似度计算与加权求和，动态决定哪些上下文信息对当前表示最重要；

这种设计使 Transformer 天然支持并行计算、长距离依赖建模和大规模参数扩展，从而成为现代大语言模型（LLMs）的基础架构。

Transformer 的本质不是“更复杂的神经网络”，而是把“理解序列”问题转化为“向量之间如何相互关注”的几何问题。

notion image

1. 词向量（Word Embedding）与语义空间

视频通过类似 “E(mother) - E(father) ≈ E(woman) - E(man)” 的例子说明：

词向量是将文字映射到高维向量空间的结果。

在语义空间中，向量之间的几何关系承载了语言规律（如性别、时态等）。

向量差（vector difference）可以表达语义关系，使得模型能够通过向量运算捕捉概念结构。

这是深度学习中最核心的思想之一：用几何关系表示概念关系。

2. Softmax 与概率分布

视频中右上角部分展示：

输入一组 logit（未归一化的分数），如：

[−0.8, −5.1, +2.8, +8.5, +3.4, −2.2, +8.0]

Softmax 将这些数值指数化并归一化，得到一组概率：

最大的 logit 对应最高概率
其他值按相对大小分布

说明：

Softmax 是将模型“偏好”转化为概率的关键步骤
也是分类任务和语言模型最后一步的基础数学模块

3. 点积（Dot Product）与相似度

左下角部分介绍了点积的意义：

点积定义：

v · w = v₁w₁ + v₂w₂ + … + vₙwₙ

点积可以被解释为：

一个向量在另一个向量方向上的投影大小
衡量两向量相似度（趋近于同方向则数值大）

在深度学习中：

注意力机制（Attention）也是通过点积衡量 Query 与 Key 的匹配度
词向量相似度也常用点积

4. 矩阵乘法（Matrix Multiplication）与神经网络

右下角部分展示神经网络中的矩阵乘法：

输入向量经过权重矩阵 W 映射到新空间：

y = W x

这是神经网络最本质的操作：

每一层都是在做线性变换 + 非线性激活

每个参数（权重）是可训练的，从数据中学习而来

视频强调：

神经网络大部分操作本质上是大量矩阵乘法
这些变换本质是向量空间的映射与重构

5. 从几何视角理解深度学习

视频整体核心思想是将复杂神经网络运算 “可视化、几何化”：

向量表示概念

线性变换改变表示空间

点积决定相似性

Softmax 归一化为概率

整个神经网络的运算可以理解为一系列几何操作在连续执行

关于 GPT

Generative\Pre-trained\Transformer

notion image

Transformer

多模态

notion image

notion image

Inside

notion image

notion image

notion image

notion image

notion image

notion image

Author:沈林曦
URL:https://blog.aibhtt.com/article/llm/learning/3Blue1Brown/chapter5
Copyright:All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!

Relate Posts

Agent 背书零知识证明

Lazy loaded image

“造物主”偏好是什么

Lazy loaded image

ClaudeCode+openclaw 本地部署

Lazy loaded image

使用 unsloth 微调 LLM

Lazy loaded image

DeepLearning Chapter 7+8

Lazy loaded image

DeepLearning Chapter 6

Lazy loaded image

DeepLearning Chapter 6 DeepLearning Chapter 4

Loading...

Catalog

0%

沈林曦

INFP-A AIGC UE5 Web3 SoloDeveloper

Latest posts

ClaudeCode+openclaw 本地部署

Announcement

冰芯糖果的博客上线啦

来逛逛吧~

Catalog

0%