How do you find the homography matrix given 4 points in both images?

Question

I want to understand the process of finding a homography matrix given 4 points in both images. I am able to do that in python OpenCV, but I wonder how it works behind the scenes.

Suppose I have points $p_1, p_2, p_3, p_4$ in the first image and $p'_1, p'_2, p'_3, p'_4$ in the second. How am I going to generate the homography matix given these points.

nbro · Accepted Answer · 2020-05-10T22:17:25.727

To understand homographies and how to find them, you will need a good dose of projective geometry. I will briefly describe some preliminary concepts that you need to know before trying to find the homography, but don't expect to understand all these concepts with one reading iteration and only by reading this answer, if you are not familiar with them, especially, if you don't even know what homogenous coordinates are. For more details, I suggest you read the book Multiple view geometry in computer vision (2004) by Richard Hartley and Andrew Zisserman, in particular, chapter 4.

The projective space $\mathbb{P}^2$

$\mathbb{P}^2$ is the projective space of $\mathbb{R}^2$, so it is $\mathbb{R}^2$ augmented with lines and points at infinity. All points and lines of $\mathbb{P}^2$ actually belong to $\mathbb{R}^3$, i.e. they are vectors of three components, because they are homogenous representations of the counterparts in $\mathbb{R}^2$. To emphasize, in $\mathbb{P}^2$, both points and lines can be represented by a vector in $\mathbb{R}^3$, which is the homogenous representation of the counterpart vector in $\mathbb{R}^2$ (if it exists, e.g. points at infinity do not exist in $\mathbb{R}^2$).

What is a homography?

A homography (aka projectivity, collineation or projective transformation) is an invertible map $h$ from the projective space $\mathbb{P}^2$ to itself, $$h: \mathbb{P}^2 \rightarrow \mathbb{P}^2,$$ such that $\mathbf{x}_1, \mathbf{x}_2, \mathbf{x}_3$ lie on the same straight-line if and only if $h(\mathbf{x}_1), h(\mathbf{x}_2), h(\mathbf{x}_3)$ do.

This property is called collinearity. In practice, this property means that projective transformations can map straight-lines in one image to straight-line to another image, but it cannot map e.g. straight-lines to parabolas (and vice-versa). So, if you have a distorted image, a homography cannot convert it to a non-distorted image.

All and only linear maps in $\mathbb{P}^2$ are homographies

All linear maps in $\mathbb{P}^2$ are homographies and only linear maps can be homographies. Hence, when you want to find a homography, you are looking for a linear map in $\mathbb{P}^2$.

A homography can be represented as $3 \times 3$ matrix

Given that homographies are linear maps they can be represented as an invertible matrix

$$\mathbf{H} = \begin{bmatrix} h_{11} & h_{12} & h_{13} \\ h_{21} & h_{22} & h_{23} \\ h_{31} & h_{32} & h_{33} \end{bmatrix} \in \mathbb{R}^{3 \times 3}, $$ such that, $\forall \mathbf{x} \in \mathbb{P}^2$, the following equation holds $$ h(\mathbf{x}) = \mathbf{H}\mathbf{x} $$

Homographies map points in $\mathbb{P}^2$ to points in $\mathbb{P}^2$

Given that homographies map points in $\mathbb{P}^2$ to other points in $\mathbb{P}^2$ and there's a matrix $\mathbf{H} \in \mathbb{R}^{3 \times 3}$ for each homography $h$, then it follows that, $\forall \mathbf{x} \in \mathbb{P}^2$, \begin{align} \mathbf{x}' &= \mathbf{H}\mathbf{x} \\ \begin{bmatrix} \mathbf{x}'_1 \\ \mathbf{x}'_2 \\ \mathbf{x}'_3 \end{bmatrix} &= \begin{bmatrix} h_{11} & h_{12} & h_{13} \\ h_{21} & h_{22} & h_{23} \\ h_{31} & h_{32} & h_{33} \end{bmatrix} \begin{bmatrix} \mathbf{x}_1 \\ \mathbf{x}_2 \\ \mathbf{x}_3 \end{bmatrix} \tag{1}\label{1} \end{align} where $\mathbf{x}' \in \mathbb{P}^2$, for some homography $\mathbf{H} \in \mathbb{R}^{3 \times 3}$.

In practice, this means that you can transform points in an image $I_1$ to points in image $I_2$ by matrix multiplication.

A homography has $8$ degrees of freedom

Although the matrix $\mathbf{H}$ has $9$ entries, it has only $8$ degrees of freedom, which means that, in practice, there are only $8$ variables that you need to find. This property comes from the fact that only the ratio between the elements of $\mathbf{H}$ actually counts. This means that equation \ref{1} can actually be written as \begin{align} \mathbf{x}' = \lambda \mathbf{H}\mathbf{x} \tag{2}\label{2} \end{align} for all $\lambda \in \mathbb{R} \setminus \{0 \}$.

How to estimate a homography from point correspondences?

We want to estimate a homography $\mathbf{H} \in \mathbb{R}^{3 \times 3}$ from point-to-point correspondences $\{(\mathbf{x}^i, \mathbf{x}'^i) \}_{i=1}^N$, where $N \geq 4$, such that $\mathbf{x}'^i = \mathbf{H}\mathbf{x}^i, \forall i$.

First, recall that the matrix $\mathbf{H}$ has $8$ degrees of freedom (i.e. variables we want to find).

Each point-to-point correspondence represents $2$ constraints

Each point-to-point correspondence $(\mathbf{x}^i, \mathbf{x}'^i)$ accounts for $2$ constraints, i.e. $\mathbf{H}\mathbf{x}^i$ maps to the point $\mathbf{x}'^i$, which has $2$ degrees of freedom (even if it's defined by 3 components) because it's represented in homogenous coordinates and so, as for equation \ref{2}, it's defined "up to scale". In other words, these $2$ degrees of freedom represent $2$ constraints.

$4$ point-to-point correspondences are necessary to estimate a homography

Given that $1$ point-to-point correspondence represents $2$ constraints, then $4$ point-to-point correspondences corresponds to $8$ constraints. Given this and given that homographies have $8$ degrees of freedom, at least $4$ point-to-point correspondences are necessary to estimate a homography.

The equation \ref{2} can actually be written as

$$ \mathbf{x}' \times \lambda \mathbf{H}\mathbf{x} = \mathbf{0} \label{3} \tag{3} $$ where $\times$ is the cross-product.

Direct linear transformation

If you manipulate equation \ref{3}, then you will end up with another equation (if you ignore the scaling factor $\lambda$, which we can ignore because the homography converts points independently of their magnitude)

$$ \mathbf{A}_i \mathbf{h} = \mathbf{0} $$ where $\mathbf{A}_i \in \mathbb{R}^{2 \times 9}$ is the design matrix (i.e. the matrix that contains the input data to estimate the homography) that contains the elements of the the point-to-point correspondence $(\mathbf{x}^i, \mathbf{x}'^i)$ and $\mathbf{h} \in \mathbb{R}^{9}$ is a vector that contains the unknown elements of $\mathbf{H}$. The details of this manipulation can be found in section 4.1 of the book Multiple view geometry in computer vision.

Now, the idea is that you can vertically stack $N \geq 4$ equations of the form $\mathbf{A}_i \mathbf{h} = \mathbf{0}$ to build the final linear system that you would like to solve $$\mathbf{A} \mathbf{h} = \mathbf{0} \tag{4}\label{4}$$ where $\mathbf{A} \in \mathbb{R}^{2N \times 9}$.

The solution to this system is the vector $\mathbf{h} \in \mathbb{R}^{9}$, that is, your homography!

If you know something about linear algebra, you know that the solutions to $\mathbf{A} \mathbf{h} = \mathbf{0}$ are elements of the null space of $\mathbf{A}$.

Then, to find $\mathbf{h}$, you will typically use singular value decomposition (SVD). See e.g. How is the null space related to singular value decomposition? for a possible explanation of why you can use SVD to find an element of the null space.

This algorithm (which is described on page 91 of the cited book, i.e. Algorithm 4.1) is called direct linear transformation.

A wonderful explanation . – Hissaan Ali May 10 '20 at 21:57 — Hissaan Ali, May 10 '20 at 21:57