classifier problem of charaters or numbers

asked 2018-05-20 20:49:39 -0500

benz gravatar image

I used HOG and SVM to train and classify the charaters or numbers for OCR. Everything seems very good, I can get a label value after prediction, but as we know, I used multiclass for the charaters and numbers, such as,

number 0  label value (20)
number 1  label value (21)
number 2  label value (22)
......
character A  label value (31)
character B  label value (32)
......

When I used findContours to get a contour without chaarcters or numbers inside, I still got a value from prediction, 20 or something. What is the problem?

edit retag flag offensive close merge delete

Comments

What is the problem?

your expectation. it can only predict, what it was trained upon.

berak gravatar imageberak ( 2018-05-21 00:16:34 -0500 )edit

My expectation for the other object found by findContours should be -1, and numbers and chars returns the correct label value. Aside predict, is there some other way to do the chars and numbers detections by SVM? or use DetectMultiScale to detect the whole ROI area ?

benz gravatar imagebenz ( 2018-05-21 01:12:38 -0500 )edit

My expectation for the other object found by findContours should be -1,

why would that be so ? you did not train it on such data. (your expectation is flawed)

one way to achieve this, would be: instead of a multiclass SVM, have a seperate one for each character, trained on: this letter vs everything else (including weird fly-shit). but you'll find, that this is a much harder problem, than your original approach.

another approach might be, to have a text detection stage before the classifier

berak gravatar imageberak ( 2018-05-21 01:19:45 -0500 )edit

I'm trying the first approach, berak, thanks so much

benz gravatar imagebenz ( 2018-05-21 20:20:16 -0500 )edit