At its core, dimensionality reduction is the quest to find a lower-dimensional representation of data that preserves as much 'information' as possible. In the context of linear algebra, 'information' is often equated with variance. If we imagine a cloud of data points in 3D space, and that cloud is shaped like a flattened pancake, we can describe the position of most points using only two dimensions without losing much detail. The goal of SVD and Eigen-analysis is to mathematically identify the axes along which the data stretches the most, allowing us to discard the axes with negligible stretch.
To understand this, we start with the Eigenvalue problem. For a square, symmetric matrix $A$ (such as a covariance matrix), an eigenvector $v$ is a vector that, when multiplied by $A$, only changes in scale, not direction. This is expressed as $Av = \\lambda v$, where $\\lambda$ is the eigenvalue. In dimensionality reduction, the eigenvectors of the covariance matrix point in the directions of maximum variance, and the corresponding eigenvalues $\\lambda$ quantify the amount of variance along those directions. This is the fundamental engine behind Principal Component Analysis (PCA).
However, not every dataset is represented by a square matrix. This is where Singular Value Decomposition (SVD) becomes the more generalized tool. SVD states that any $m \\× n$ matrix $X$ can be factorized into three matrices: $X = U \Sigma V^T$. Here, $U$ is an $m \\× m$ orthogonal matrix representing the left singular vectors, $\Sigma$ is an $m \\× n$ diagonal matrix containing the singular values $\sigma_i$, and $V^T$ is the transpose of an $n \\× n$ orthogonal matrix representing the right singular vectors.
The relationship between SVD and Eigen-analysis is profound. If we compute the product $X^T X$, we obtain a square symmetric matrix. Substituting the SVD form, we get $X^T X = (V \Sigma^T U^T)(U \Sigma V^T)$. Since $U$ is orthogonal, $U^T U = I$, simplifying the expression to $X^T X = V \Sigma^T \Sigma V^T$. This is exactly the form of an eigendecomposition, where the columns of $V$ are the eigenvectors of $X^T X$ and the squares of the singular values $\sigma_i^2$ are the eigenvalues $\\lambda_i$.
For dimensionality reduction, we employ a technique called 'truncated SVD'. Because the singular values in $\Sigma$ are typically sorted in descending order ($\sigma_1 \ge \sigma_2 \ge \dots \ge \sigma_r$), the first few singular values capture the vast majority of the data's energy. By keeping only the top $k$ singular values and zeroing out the rest, we create a low-rank approximation of the original matrix: $X_k = U_k \Sigma_k V_k^T$. This process effectively filters out noise and retains the most dominant patterns.
The practical implication is that $V_k$ provides the principal axes of the data. By projecting the original data $X$ onto these axes via $X_{reduced} = X V_k$, we transform our high-dimensional features into a small set of uncorrelated components. This not only reduces the computational burden for downstream machine learning models but also mitigates the 'curse of dimensionality,' preventing overfitting by removing linear redundancies in the feature set.