H-Nets#
H-Nets are
- an extension of LLMs (they predict the next
$thing) - a dynamic tokenizer (they define token chunks on-the-fly)
- recursively defined (you can do nested tokenization)
Hence, “Dynamic Chunking for End-to-End Hierarchical Sequence Modeling”.
I think H-Nets make an awful lot of sense, so I’ve done lots of research with them.
2025
H-Net - Scaling Laws (Byte)
·3465 words
H-Net - Parameterization (S≥1)
·3762 words
H-Net - Intuitions
·2180 words
H-Net - Engineering (1gpu)
·2027 words
H-Net - Parameterization (Baseline)
·1906 words
H-Net - Inference
·1936 words