Posts
- Trajectory
[md]
You won each decision and lost the trajectory.
- AVnorm
[md]
Per-head normalization on attention outputs fixes length generalization.
- The Box
[md]
The hardest box to escape is the one you cannot see.
- Attention Normalizes the Wrong Norm
[md]
Softmax constrains the L1 norm to 1, but should constrain the L2 norm.
- People are the new oil
[md]
Compute used to be the bottleneck. Now it's people.