# Singular Value Decomposition Part 2: Theorem, Proof, Algorithm Math ∩ Programming

I’m just going to jump right into the definitions and rigor, so if you haven’t read the previous post motivating the singular value decomposition, go back and do that first. This post will be theorem, proof, algorithm, data. The data set we test on is a thousand-story CNN news data set. All of the data, code, and examples used in this post is in a github repository, as usual.

We start with the best-approximating \$latex k\$-dimensional linear subspace.

Definition: Let \$latex X = { x_1, dots, x_m }\$ be a set of \$latex m\$ points in \$latex mathbb{R}^n\$. The best approximating \$latex k\$-dimensional linear subspace of \$latex X\$ is the \$latex k\$-dimensional linear subspace \$latex V subset mathbb{R}^n\$ which minimizes the sum of the squared distances from the points in \$latex X\$ to \$latex V\$.

Let me clarify what I mean by minimizing the sum of squared distances. First we’ll start with the…

View original post 5,066 more words