# classifier problem of charaters or numbers

I used HOG and SVM to train and classify the charaters or numbers for OCR. Everything seems very good, I can get a label value after prediction, but as we know, I used multiclass for the charaters and numbers, such as,

number 0  label value (20)
number 1  label value (21)
number 2  label value (22)
......
character A  label value (31)
character B  label value (32)
......


When I used findContours to get a contour without chaarcters or numbers inside, I still got a value from prediction, 20 or something. What is the problem?

edit retag close merge delete

What is the problem?

your expectation. it can only predict, what it was trained upon.

( 2018-05-21 00:16:34 -0500 )edit

My expectation for the other object found by findContours should be -1, and numbers and chars returns the correct label value. Aside predict, is there some other way to do the chars and numbers detections by SVM? or use DetectMultiScale to detect the whole ROI area ?

( 2018-05-21 01:12:38 -0500 )edit

My expectation for the other object found by findContours should be -1,

why would that be so ? you did not train it on such data. (your expectation is flawed)

one way to achieve this, would be: instead of a multiclass SVM, have a seperate one for each character, trained on: this letter vs everything else (including weird fly-shit). but you'll find, that this is a much harder problem, than your original approach.

another approach might be, to have a text detection stage before the classifier

( 2018-05-21 01:19:45 -0500 )edit

I'm trying the first approach, berak, thanks so much

( 2018-05-21 20:20:16 -0500 )edit