Page Detection

Page detection (also called page decomposition) is the most important feature of the image processor. Without page detection, PPP would not be able to position images reliably. Page detection is the phase of image processing when PPP extracts the subimage from the input image. Page detection only applies to the Positioning mode. In simple mode, there is no page detection performed.


The page detector is one of the most difficult parts of the program. Even though a human can point out the boundaries of an image at a glance, computers are not intelligent. The problem is that dirt and speckles must be separated from the useful image. PPP tries to keep the speckles out of the subimage.


So the image processor must distinguish between speckles and the subimage. This is done independently from the Despeckle parameters. Whether the speckle removal feature is on or not, the page detector will keep speckles out of the subimage. See the result of page detection in Figure 4-1.



Figure 4-1. Result of Subimage Extraction


The page detector will automatically find the boundaries of the subimage. The result will always be a rectangle around the subimage.


Though the page detection is very fast and accurate, sometimes it might fail. For example, very large speckles, black borders or dirt by the edge of the image will mislead the page detector. Most of these problems can be solved by setting up an Edge cleanup to clean the border.


Since small page numbers or very small letters are much like speckles, it is very hard to distinguish them. There is a way to overcome this problem. The basic idea is that bad quality images contain a lot of large speckles, while good quality images only contain a few small speckles. Therefore it is a good idea to process good and bad quality images separately, using different page detection settings. PPP comes with readily made page detection settings that can be used for any kind of image. In addition, the page detection algorithm is documented, so you can fine tune these parameters and work out page detection settings that meet your needs.



