Created a basic document scanner using OpenCV that detects a 4-sided document in an image and straightens it out for a full frontal view.
I created a basic document scanner using OpenCV that detects a 4-sided document in an image and straightens it out for a full frontal view. This is based on the excellent How to Build a Kick-Ass Mobile Document Scanner in Just 5 Minutes tutorial from pyimagesearch.
The basic steps are:
- load the image and resize it (apparently the edge detection does not work well on large images)

- detect edges

- find all contours and return the top 5 largest of these contours
- convert the contours to polygon approximations (e.g. if the contours are a list of many points tracing out a n-sided polygon, then just turn it into n points that represent a similar approximated polygon – this is done using the Ramer-Douglas-Peucker algorithm)
- store the 4 points of the largest contour that can be approximated as a 4-sided polygon

- calculate the width and height of the output image based on the 4-sided polygon
- perspective warp image by corner-pinning the 4 points on the polygon to 4 corners of output image – this will straighten out the sides to a frontal view
- apply thresholding for a photocopy look

Seems like this method works well only with text documents on white paper. I tried a bunch of other examples that are more complicated and the script did not manage to pick out the boundaries.