The Hough transform is a feature extraction technique used in image analysis and computer vision. The purpose of the technique is to find imperfect instances of objects within a certain class of shapes by a voting procedure. The classical Hough transform was developed with the intention of being able to identify straight lines in the image, but later the Hough transform has been extended to identifying positions of arbitrary shapes, most commonly circles or ellipses. This post will involve a bit of math, but just elementary concepts you learned in school. Also, we'll work with lines only, though the technique can be easily extended to other shapes.
Why Hough transform?
Suppose you have image of straight road. You figure out edge pixels (using the Canny edge detector, the Sobel edge detector, or any other thing). Now you want a geometrical representation of the pole's edge. But right now the "edge" is just a sequence of pixels. You can loop through all pixels, and some how figure out the slope and intercept. But that is one difficult task. Images are never perfect. So you want some mechanism that give more weightage to pixels that are already in a line. This is exactly what the Hough Transform does. It lets each point on the image "vote". And because of the mathematical properties of the transform, this "voting" allows us to figure out prominent lines in the image. Don’t worry if you like a tail-ender facing a bouncer ball. You should soon be able to face them with ease.!
The mathematics behind hough transform:
Convert line to point:
A lines is a collection of points. And managing a collection of points is tougher than managing a single point. Obviously. So the first thing we learn is how to represent a line as a single point, without losing any information about it.
This is done through the m-c space.
So the parameter space, or the mc space was devised. Every line in the xy space is equivalent to a single point in the mc space.
Convert point to line
Now onto the next step. Consider a point (say, (xa, ya) )in the xy space. What would its representation in the mc space be?
For starters, you could guess that infinite lines pass through a point. So, for every line passing through (xa, ya), there would be a point in the mc space.
And you're correct there, because of the following:
- Any line passing through (xa, ya): ya = mxa + c
- Rearranging: c = - xam + ya
- The above is the equation of a line in the mc space.
So, a point in the xy space is equivalent to a line in the mc space.
Now we know conversion of points to line and vice-versa. Now, if we observe each intersection point, then we can easily see that intersection point of line 1 and line 2 represents the slope and intercept of the line passing through point 1 and point 2. Similary, intersection point of line 3 and line 4 represents the slope and the intercept of the line passing through point 3 and point 4.
The interesction point of line 1, line 2 and line 3 represents the slope and the intercept of the line passing through point 1, point 2 and point 3.
So we can see that the point which is the common interception of more number of lines will represent the slope and intercept of the line passing through more number of points. And similary, if a point has only one passing through line in mc space, it will represent a line which passes through only one edge point. If a large number of lines intersect at same point, we can say that the line in xy space for that point is more dominant line. Thus, we can identify edge points in image which lie on same straight line.
Hough Space:
There is a flaw in the method which we just discussed. The value of slope i.e m tends to infinity for the vertical lines. So, we need to have infinite memory to store the ‘mc space’. So we have a work around this which will helps us maintain the concept but won’t have high memory requirements.
We use polar co-ordinates for this purpose.
r = x1cos(Θ)+y1sin(Θ)
where Θ is the angle that normal from the origin to the line makes with the x-axis and r is the perpendicular distance of the line from the origin.
Range of parameters:
- Θ varies from -90 to +90.
- r varies from 0 to the diagonal length of the image.
With this new representation, we have a new transition from xy plane to Hough plane. A line in the xy space still represents a point in Hough space (pΘ plane here). But a point in xy space represents a sinusoid in r-Θ space.
Actual implementation:
We have edge points in the images detected using one of edge detector i.e canny edge detector, sober detector and so on. Now every non-black pixel (i.e pixel which lies on edge ) is represented on hough space. A 2D accumulator is created which accumulates votes of each point represented by each cell of the accumulator. In our case, the accumulator cells form a 2D array. The horizontal axis is for the different θ values and the vertical axis for p values.
You choose a sinusoidal and vary Θ to get different r. You take the value of Θ and obtain corresponding value of r and cast a vote in the particular (Θ, r) accumulator cell. By voting, you increase the value of the cell by 1. After voting, you choose another value of Θ and vote in corresonding accumulator cell by Θ and r. Continue doing this from Θ=-90 to Θ=+90. And for every such calculation, you add vote in the accumulator.
Depending on the number of votes at each cell in accumulator, we will obtain the line which will be the best fit for a certain set of points. More the number of votes at a particular cell (i.e. r and Θ), more number of points lie on that line. In the hough space diagram, the bright spots denote more number of points voted for that spot. Thus, we can have straight lines in the image detected using hough transform. Now, get your fingers ready and start implementing it on your machine!!
Subscribe to regularly get updates in your mail box.
Subscribe to regularly get updates in your mail box.