H-Nets#
H-Nets are
- an extension of LLMs (they predict the next
$thing
) - a dynamic tokenizer (they define token chunks on-the-fly)
- recursively defined (you can do nested tokenization)
Hence, “Dynamic Chunking for End-to-End Hierarchical Sequence Modeling”.
I think H-Nets make an awful lot of sense, so I’ve done lots of research with them.
2025
H-Net - Scaling Laws (Byte)
·3380 words
H-Net - Parameterization (S≥1)
·3638 words
H-Net - Intuitions
·2180 words
H-Net - Engineering (1gpu)
·1970 words
H-Net - Parameterization (Baseline)
·1895 words
H-Net - Inference
·1913 words