Skip to main content
Tool mentioned on podcasts

CRATE

Mentioned on 1 episode by 1 guest across our covered podcasts.

SignalCast may earn commission on purchases via these links. As an Amazon Associate, SignalCast earns from qualifying purchases.

Who mentioned it

  • Yi MaAuthor
    White-Box Transformers (CRATE): Multi-head self-attention emerges mathematically as gradient steps optimizing rate reduction objectives, while MLPs function as sparsification operators. This derivation eliminates dozens of hyperparameters and achieves linear time complexity versus quadratic in standard transformers, enabling principled architecture design rather than empirical search.
CRATE — Tool mentioned on podcasts | SignalCast