Ask Your Question

What evaluation classifiers? Precision & recall?

asked 2013-09-25 16:09:05 -0500


I have some labeled data which classifies datasets as positive or negative. Now i have an algorithm that does the same automatically and I want to compare the results.

I was said to use precision and recall, but I'm not sure whether those are appropriate because the true negatives don't even appear in the formulas. I'd rather tend to use a general "prediction rate" for both, positives and negatives.

How would be a good way to evaluate the algorithm? Thanks!!

edit retag flag offensive close merge delete


example: the data looks like this: {[some text, pos, pos]; [other txt, neg, pos]; [whatever, neg, neg]; [littlepny, pos, neg]} its like some data, then the manual annotation, then the program's output.

classification_guy gravatar imageclassification_guy ( 2013-09-26 05:15:25 -0500 )edit

1 answer

Sort by ยป oldest newest most voted

answered 2013-09-26 03:53:40 -0500

Guanta gravatar image

There exist plenty of error measurements, however you should take care on their combinations. Typically you don't tell the true negative rate since it is implicitely covered in the other measurements, i.e.: precision & recall would be totally fine, as well as true positive & false positives would be. In some research areas sensitivity & specifity are more common than the other two. Furthermore you can a) plot your results in ROC-curves, b) give the area under the curve (AUC), and c) give the F1-measure (combination of precision & recall).

edit flag offensive delete link more


Thanks for your answer! I'm just a bit confused about the whole precision/recall/f1-thing because the true negatives don't count... the results don't change whether there is just one dataset correctly labeled as negative or 1000... just the false negatives are counted. Or am I wrong? Would it still be an appropriate evaluation for the this task?

classification_guy gravatar imageclassification_guy ( 2013-09-26 05:14:25 -0500 )edit

The group: true positive rate and false negative rate add to 1, as well as the other group: true negative rate and false positive rate also add to 1. So, it is sufficient to use one rate from one group and one rate from the other group. So, if you give the true negative rate, then you should as well give either the true positive or the false negative rate.

Guanta gravatar imageGuanta ( 2013-09-26 05:49:20 -0500 )edit

Hmm sorry but i still don't get it... for example: if i have 31 data sets of which 21 are labeled negative and 10 are positive, and my algorithm labels 2 negatives as positives and 5 positives as negatives. then tp = 5, fp = 2, tn = 19, fn = 5 ...right? my instinct says that it's more appropriate to represent this results as a prediction rate of 77.4% rather than a precision of 71% and a recall of 50% ...or is my instinct wrong? thanks!

classification_guy gravatar imageclassification_guy ( 2013-09-26 06:29:29 -0500 )edit

Sorry that I confused you, maybe I should have mentioned that there exist also the classifier score aka accuracy (what you call prediction). This measures the overall classification outcome. This may be good internatlly for testing your classifier / classification result. However, it depends on the proportion of positives/negatives in contrast to tpr (true positive rate) and fpr (false positive rate) or the other mentioned values above. So, I'll give you a different example: if you would have a dataset with as double positives as negatives and a bad classifier which evaluates w. 50% chance then your accuracy is still 66%. However tpr and fpr would both be 50% (though precision would give 66%, since precision specifically looks at the proportion of true positives to all detected positives).

Guanta gravatar imageGuanta ( 2013-09-26 07:18:59 -0500 )edit

If we look at your example than the both values precision and recall contain much more information, we see directly that you fail in the half of your positives to classify them correctly, however precision says, that the portion of all positives is 70%. These values are independent of the proportion of negatives vs positives.

Guanta gravatar imageGuanta ( 2013-09-26 07:23:13 -0500 )edit

Doing some search I bumped into this topic. I would like to add that the reason why computer vision uses precision recall more frequently than ROC curves is exactly due to the true negative case. If you are evaluating a multi scale sliding window based approach your true negatives will be exponentially larger than all other values, making it impossible to draw decent ROC curves. Therefore PR solves the problem for you!

StevenPuttemans gravatar imageStevenPuttemans ( 2015-02-12 02:15:52 -0500 )edit

Very good point, we recently had the same at the lab. Exactly, if you don't have a balanced set of positives and negatives, precision-recall curves are preferrable. However I want to add that you can still use ROC when you change to a logarithmic scale.

Guanta gravatar imageGuanta ( 2015-02-12 10:23:05 -0500 )edit

Question Tools


Asked: 2013-09-25 16:09:05 -0500

Seen: 1,314 times

Last updated: Sep 26 '13