At its core, dimensionality reduction is about finding a lower-dimensional subspace that captures the 'maximum variance' or the most significant structure of a dataset. Imagine a 3D cloud of points that mostly lies along a tilted 2D plane; we can describe those points more efficiently by projecting them onto that plane. The mathematical engine that allows us to find these optimal planes is rooted in the study of linear transformations, specifically through the lens of eigenvalues and Singular Value Decomposition (SVD).
To understand SVD, we first examine the Eigendecomposition of a square, symmetric matrix. For a covariance matrix $\\mathbf{C}$, an eigenvector $\\mathbf{v}$ is a direction that remains unchanged in orientation when transformed by $\\mathbf{C}$, scaled only by a factor $\\lambda$ called the eigenvalue. This is expressed as $$\\mathbf{C}\\mathbf{v} = \\lambda\\mathbf{v}$$ where $\\lambda$ represents the variance captured along the axis $\\mathbf{v}$. In dimensionality reduction, we sort these eigenvalues in descending order, keeping only the top $k$ vectors to retain the most 'information' while discarding the noise.
While Eigendecomposition requires a square matrix, SVD generalizes this concept to any $m \\× n$ matrix $\\mathbf{A}$. SVD factors the matrix into three distinct components: $\\mathbf{A} = \\mathbf{U}\\mathbf{\\Sigma}\\mathbf{V}^T$. Here, $\\mathbf{U}$ is an $m \\× m$ orthogonal matrix representing the left singular vectors (the 'output' space), $\\mathbf{V}$ is an $n \\× n$ orthogonal matrix representing the right singular vectors (the 'input' space), and $\\mathbf{\\Sigma}$ is a diagonal matrix containing the singular values $\\sigma_i$ in descending order.
The intuition behind $\\mathbf{\\Sigma}$ is critical: the singular values $\\sigma_i$ are the square roots of the eigenvalues of $\\mathbf{A}^T\\mathbf{A}$. They quantify the 'strength' of each dimension. If we find that $\\sigma_1 \gg \\sigma_{10}$, it implies that the first singular vector captures far more of the data's energy than the tenth. By setting the smaller singular values to zero, we create a low-rank approximation of the original matrix, effectively compressing the data without losing its primary structural characteristics.
This leads us directly to Principal Component Analysis (PCA). PCA is essentially the application of SVD to a centered data matrix $\\mathbf{X}$ (where the mean of each feature is subtracted). The right singular vectors $\\mathbf{V}$ are the Principal Components. The projection of the data $\\mathbf{X}$ onto the first $k$ columns of $\\mathbf{V}$ minimizes the reconstruction error, providing the most efficient linear representation of the data in $k$ dimensions: $$\\hat{\\mathbf{X}} = \\mathbf{U}_k \\mathbf{\\Sigma}_k \\mathbf{V}_k^T$$
In summary, eigenvalues tell us 'how much' variance exists, and eigenvectors (or singular vectors) tell us 'in which direction' that variance lies. SVD provides a robust numerical framework to extract these directions for any dataset, regardless of shape. By manipulating the spectrum of singular values, we can filter noise, compress images, and visualize high-dimensional manifolds, transforming a computationally intractable sea of features into a streamlined, interpretable coordinate system.