1 | initial version |
simple binary documents (without images): then you can just use a simple heuristic: compute the mean of all pixels: if average > 128 --> black script on white paper.
simple documents (not binary, i.e. with gradients etc.): compute the image histogram, from the histogram compute the skewness (https://en.wikipedia.org/wiki/Skewness), if the skewness is negative --> black script on white paper.
non simple document image, i.e. containing image content etc: then you probably need to classify your image, probably a simple bag of (visual) words scheme will work quite good.