Singular Value Decomposition (SVD) and Eigenvalues in Dimensionality Reduction

To understand dimensionality reduction, we must first grasp the intuition of 'redundancy.' In high-dimensional datasets, many features are often highly correlated, meaning the data doesn't actually occupy the full volume of its coordinate space but instead lies on a lower-dimensional manifold. The goal of dimensionality reduction is to find a new coordinate system—a set of basis vectors—where the most important information is concentrated in as few dimensions as possible, allowing us to discard the 'noise' without losing the 'signal.'

The mathematical foundation starts with the Eigendecomposition of a covariance matrix. Given a centered data matrix $X$, the covariance matrix is $C = \frac{1}{n-1} X^T X$. We seek the eigenvectors $v$ and eigenvalues $\lambda$ such that $Cv = \lambda v$. The eigenvectors represent the directions of maximum variance (Principal Components), and the eigenvalues $\lambda$ quantify the amount of variance explained by each direction. By sorting these eigenvalues in descending order, we can identify which directions are critical and which are negligible.

While Eigendecomposition works for square matrices (like the covariance matrix), Singular Value Decomposition (SVD) is a more general tool applicable to any $m \\× n$ matrix $A$. SVD factorizes the matrix into three distinct components: $A = U\Sigma V^T$. Here, $U$ is an $m \\× m$ orthogonal matrix representing the left singular vectors, $\Sigma$ is an $m \\× n$ diagonal matrix containing the singular values $\sigma_i$, and $V^T$ is an $n \\× n$ orthogonal matrix representing the right singular vectors.

The deep connection between SVD and Eigenvalues is revealed when we examine the product $A^T A$. If we substitute the SVD into this expression: $$A^T A = (V\Sigma^T U^T)(U\Sigma V^T) = V\Sigma^T \Sigma V^T$$ Since $U$ is orthogonal ($U^T U = I$), we see that $A^T A = V\Sigma^2 V^T$. This is exactly the form of an Eigendecomposition, where the columns of $V$ are the eigenvectors of $A^T A$ and the squares of the singular values $\sigma_i^2$ are the eigenvalues $\lambda_i$.

In the context of dimensionality reduction, specifically Principal Component Analysis (PCA), we use the truncated SVD. Instead of keeping all singular values, we keep only the top $k$ values in $\Sigma$ and set the rest to zero. This creates a low-rank approximation of the original matrix, $A_k = U_k \Sigma_k V_k^T$. According to the Eckart-Young-Mirsky theorem, this approximation is the optimal rank-$k$ representation of the matrix in terms of the Frobenius norm, minimizing the reconstruction error: $\\|A - A_k\\|_F$.

Ultimately, SVD allows us to project the original data onto a lower-dimensional subspace while preserving the maximum possible variance. By transforming our data from the original basis to the basis provided by $V$, we decorrelate the features. This not only reduces the storage requirements and computational complexity for downstream machine learning models but also helps mitigate the 'curse of dimensionality,' effectively filtering out stochastic noise and revealing the underlying latent structure of the data.