↓Skip to main content

Road to a Better Transformer

These are gemini-2.5-flash-preview-04-17 translations of Jianlin Su’s Transformer升级之路 articles, vetted by native Chinese readers.

I picked 2.5 Flash for 3 reasons:

it is fast & free (given only 20 articles to translate)
no chunking required, output length is sufficient (and consistent)
gemini models in general are better than claude/openai at ‘rote’ wholesale semantic copying tasks (such as translation). It is very difficult (personally, for me) to coax other models into avoiding summarization or omission or reshaping of inputs.

I’ve put more care into the technical accuracy of these translations than in the previous case. Nonetheless, you should be mindful of potential errors.

2025

19. The Second Type of Rotary Positional Embedding

18 April 2025·1241 words

2024

18. Principle for RoPE's Base Selection

29 May 2024·2411 words

17. Simple Thoughts on Multimodal Positional Encoding

29 March 2024·2732 words

16. A 'Review' of Length Extrapolation Techniques

26 January 2024·6415 words

2023

15. Key Normalization Boosts Length Extrapolation

20 November 2023·1983 words

14. When HWFA Meets ReRoPE

24 August 2023·1432 words

13. Inverse Use of Leaky ReRoPE

14 August 2023·1417 words

12. Infinitely Extrapolatable ReRoPE?

7 August 2023·2347 words

11. Carrying the Base-β Position to the End

31 July 2023·1464 words

10. RoPE is a β-Based Encoding

6 July 2023·2265 words

9. A New Approach to Global Length Extrapolation

12 May 2023·2070 words

8. Length Extrapolation and Positional Robustness

31 January 2023·2854 words

7. Length Extrapolation and Local Attention

12 January 2023·2094 words

2022

6. Completeness Analysis of Rotational Positional Encoding

28 December 2022·1112 words

2021

5. Linear Attention as Infinite Dimension

6 August 2021·1066 words

4. Rotary Position Embedding for 2D Positions

10 May 2021·2811 words

3. From Performer to Linear Attention

22 April 2021·1661 words

2. Rotary Position Embedding with Diverse Strengths

23 March 2021·2432 words

1. Tracing the Origins of Sinusoidal Position Encoding

8 March 2021·2220 words