Created a basic document scanner using OpenCV that detects a 4-sided document in an image and straightens it out for a full frontal view.

I created a basic document scanner using OpenCV that detects a 4-sided document in an image and straightens it out for a full frontal view. This is based on the excellent How to Build a Kick-Ass Mobile Document Scanner in Just 5 Minutes tutorial from pyimagesearch.

The basic steps are:

  • load the image and resize it (apparently the edge detection does not work well on large images)
  • detect edges
  • find all contours and return the top 5 largest of these contours
  • convert the contours to polygon approximations (e.g. if the contours are a list of many points tracing out a n-sided polygon, then just turn it into n points that represent a similar approximated polygon – this is done using the Ramer-Douglas-Peucker algorithm)
  • store the 4 points of the largest contour that can be approximated as a 4-sided polygon
  • calculate the width and height of the output image based on the 4-sided polygon
  • perspective warp image by corner-pinning the 4 points on the polygon to 4 corners of output image – this will straighten out the sides to a frontal view
  • apply thresholding for a photocopy look

Seems like this method works well only with text documents on white paper. I tried a bunch of other examples that are more complicated and the script did not manage to pick out the boundaries.