Fast Estimation of the Spectral Norm of Random Matrices

Table of Contents

This is a gemini-2.5-flash translation of a Chinese article.

It has NOT been vetted for errors. You should have the original article open in a parallel tab at all times.

By Su Jianlin | 2025-10-12 | 12629 Readers

In the “Approximate Estimation” section of 《Higher-Order MuP: Simpler but More Sophisticated Spectral Condition Scaling》, we previously presented a conclusion: “For a random matrix of size $n \times m$ following a standard normal distribution, its spectral norm is approximately $\sqrt{n}+\sqrt{m}$.”

In this article, we will further discuss this conclusion and provide a fast estimation method for the spectral norm of random matrices.

Random Matrix Theory
#

Consider a random matrix $\boldsymbol{W}\in\mathbb{R}^{n\times m}$, where each element is sampled independently and identically from a standard normal distribution $\mathcal{N}(0,1)$. We aim to estimate the spectral norm of $\boldsymbol{W}$, which is its largest singular value, and we take $\mathbb{E}[\Vert\boldsymbol{W}\Vert_2]$ as the final estimation result.

First, it should be pointed out that the analysis of random matrix properties has formed a specialized branch. Regarding the estimation of the spectral norm for normal matrices, relevant keywords include “Marchenko–Pastur Distribution”, “Bai-Yin law”, “Gordon’s theorem”, etc. These encompass detailed estimation results for singular value distributions. However, we will not delve into these topics; instead, we will introduce a method to quickly derive $\sqrt{n}+\sqrt{m}$.

This method is a simplification by the author based on “Section 5.3.1” of 《Introduction to the non-asymptotic analysis of random matrices》. It is essentially a “popular science” explanation, aiming to provide a quick understanding of the origin of $\sqrt{n}+\sqrt{m}$. A rigorous proof would require many tedious and monotonous details, which will not be elaborated upon here.

Spectral Norm Estimation
#

Our starting point is the identity

$$ \begin{equation}\Vert\boldsymbol{W}\Vert_2 = \max_{\Vert \boldsymbol{u}\Vert=1, \Vert \boldsymbol{v}\Vert=1} \boldsymbol{u}^{\top}\boldsymbol{W} \boldsymbol{v}\end{equation} $$

where $\boldsymbol{u}\in\mathbb{R}^n,\boldsymbol{v}\in\mathbb{R}^m$. Directly calculating $\Vert\boldsymbol{W}\Vert_2$ is generally not easy, so it is natural to look for some approximation. We consider the following two “half-finished products”:

$$ \begin{equation}\max_{\Vert \boldsymbol{u}\Vert=1} \boldsymbol{u}^{\top}\boldsymbol{W} \boldsymbol{v}\qquad\qquad \max_{\Vert \boldsymbol{v}\Vert=1} \boldsymbol{u}^{\top}\boldsymbol{W} \boldsymbol{v}\end{equation} $$

Intuitively, compared to the previous identity, both expressions above only complete half the work. We make a bold assumption: their results are also half of the complete result. Thus, by superimposing them, we consider this a good approximation of the final result:

$$ \begin{equation}\Vert\boldsymbol{W}\Vert_2 = \max_{\Vert \boldsymbol{u}\Vert=1, \Vert \boldsymbol{v}\Vert=1} \boldsymbol{u}^{\top}\boldsymbol{W} \boldsymbol{v} \approx \max_{\Vert \boldsymbol{u}\Vert=1} \boldsymbol{u}^{\top}\boldsymbol{W} \boldsymbol{v} + \max_{\Vert \boldsymbol{v}\Vert=1} \boldsymbol{u}^{\top}\boldsymbol{W} \boldsymbol{v} = \Vert \boldsymbol{W}\boldsymbol{v}\Vert + \Vert \boldsymbol{u}^{\top}\boldsymbol{W}\Vert \end{equation} $$

That is, by sampling $\boldsymbol{u}$ and $\boldsymbol{v}$ from the unit hyperspheres of $\mathbb{R}^n$ and $\mathbb{R}^m$ respectively, we can obtain an approximation of $\boldsymbol{W}$’s spectral norm according to the above equation.

Calculating Expected Values
#

With this approximation, we can calculate

$$ \begin{equation}\mathbb{E}[\Vert\boldsymbol{W}\Vert_2]\approx\mathbb{E}[\Vert \boldsymbol{W}\boldsymbol{v}\Vert] + \mathbb{E}[\Vert \boldsymbol{u}^{\top}\boldsymbol{W}\Vert] \approx \sqrt{\mathbb{E}[\Vert \boldsymbol{W}\boldsymbol{v}\Vert^2]} + \sqrt{\mathbb{E}[\Vert \boldsymbol{u}^{\top}\boldsymbol{W}\Vert^2]}\end{equation} $$

where

$$ \begin{equation}\mathbb{E}[\Vert \boldsymbol{W}\boldsymbol{v}\Vert^2] = \mathbb{E}[ \boldsymbol{v}^{\top}\boldsymbol{W}^{\top}\boldsymbol{W}\boldsymbol{v}] = \boldsymbol{v}^{\top}(n\boldsymbol{I}_m)\boldsymbol{v} = n\end{equation} $$

Similarly, $\mathbb{E}[\Vert \boldsymbol{u}^{\top}\boldsymbol{W}\Vert^2]=m$, thus

$$ \begin{equation}\mathbb{E}[\Vert\boldsymbol{W}\Vert_2]\approx\sqrt{n} + \sqrt{m}\end{equation} $$

This result is actually very accurate (which can be verified through simulation experiments). Specifically, if $n=ka,m=kb$, where $a,b$ are constants, then

$$ \begin{equation}\lim_{k\to\infty} \frac{\Vert\boldsymbol{W}\Vert_2}{\sqrt{n} + \sqrt{m}} = 1,\qquad \boldsymbol{W}\sim\mathcal{N}(0,1)\end{equation} $$

The reason for this accuracy is that we “cheated” — the crucial approximation formula is essentially derived by reverse-engineering from the known correct answer. Besides this, the only conditions we used are $\mathbb{E}[\boldsymbol{W}^{\top}\boldsymbol{W}]=n\boldsymbol{I}_m$ and $\mathbb{E}[\boldsymbol{W}\boldsymbol{W}^{\top}]=m\boldsymbol{I}_n$. Therefore, it can be considered that the above approximation holds for any distribution with zero mean and unit variance.

Minimum Singular Value
#

The spectral norm is the largest singular value. In fact, we can use the same approach to estimate the minimum singular value. Of course, the term “minimum” here needs a more rigorous definition. Let $n\geq m$. The minimum singular value here refers to the $m$-th singular value of $\boldsymbol{W}$ when ordered from largest to smallest, and it is equal to

$$ \begin{equation}\sigma_{\min}(\boldsymbol{W}) = \min_{\Vert \boldsymbol{v}\Vert=1}\max_{\Vert \boldsymbol{u}\Vert=1} \boldsymbol{u}^{\top}\boldsymbol{W} \boldsymbol{v}\end{equation} $$

Note that the positions and subjects of $\min$ and $\max$ here cannot be interchanged. Similarly, we consider the sum of two “half-finished products” from $\min$ and $\max$ separately, as its approximation:

$$ \begin{equation}\sigma_{\min}(\boldsymbol{W}) = \min_{\Vert \boldsymbol{v}\Vert=1}\max_{\Vert \boldsymbol{u}\Vert=1} \boldsymbol{u}^{\top}\boldsymbol{W} \boldsymbol{v}\approx \min_{\Vert \boldsymbol{v}\Vert=1}\boldsymbol{u}^{\top}\boldsymbol{W} \boldsymbol{v} + \max_{\Vert \boldsymbol{u}\Vert=1} \boldsymbol{u}^{\top}\boldsymbol{W} \boldsymbol{v} = -\Vert \boldsymbol{u}^{\top}\boldsymbol{W}\Vert + \Vert \boldsymbol{W}\boldsymbol{v}\Vert \end{equation} $$

The rest follows the same logic as before, and the result is $\mathbb{E}[\sigma_{\min}(\boldsymbol{W})]\approx\sqrt{n}-\sqrt{m}$.

Summary (formatted)
#

This article provides a quick approach for estimating the spectral norm of random matrices, or more precisely, a popular science, heuristic explanation, rather than a rigorous and accurate derivation. It is possible to formalize it rigorously, but that would require adding many theoretical details, all of which have been omitted.

@online{kexuefm-11335,
        title={Fast Estimation of the Spectral Norm of Random Matrices},
        author={苏剑林},
        year={2025},
        month={10},
        url={\url{https://kexue.fm/archives/11335}},
}

Random Matrix Theory#

Spectral Norm Estimation#

Calculating Expected Values#

Minimum Singular Value#

Summary (formatted)#

Random Matrix Theory
#

Spectral Norm Estimation
#

Calculating Expected Values
#

Minimum Singular Value
#

Summary (formatted)
#