Revision history [back]

Probability of correct classification using LDA?

I recently asked a question here about how to use LDA. I had a lot of patient help from berak who showed me how to predict the label for a new image based on the closest value in my projection. One thing I would like to do though, is work out the probability of the prediction based on the clusters that my groups have formed. I tried this out with OpenCV + Python + scikit-learn and the results were very encouraging:

image description

Does anyone know how to do something similar in C++ and OpenCV which is what my main program is written in? The main LDA functionality I have so far looks like this (thanks to all the help berak gave me):

Training Stage: Here, I have 3 different folders with my 3 tag types in them, I flatten out the first image to initialise my matrix, then keep pushing back on it with the flattened remaining images until I have a matrix where each row comes from a separate image of a tag.

cv::Mat initial_image = cv::imread("/Users/u5305887/Desktop/tags/I/0.jpg", 0);
cv::Mat trainData = initial_image.reshape (1, 1);

for (int i=1; i < 938; i++)
{
    std::string filename = "/Users/u5305887/Desktop/tags/I/";
    filename = filename + std::to_string(i);
    filename = filename + ".jpg"
    cv::Mat image = cv::imread(filename, 0)
    cv::Mat flat_image = image.reshape(1,1);
    trainData.push_back(flat_image);
}

for (int i=0; i < 977; i++)
{
    std::string filename = "/Users/u5305887/Desktop/tags/O/";
    filename = filename + std::to_string(i);
    filename = filename + ".jpg"
    cv::Mat image = cv::imread(filename, 0)
    cv::Mat flat_image = image.reshape(1,1);
    trainData.push_back(flat_image);
}

for (int i=0; i < 457; i++)
{
    std::string filename = "/Users/u5305887/Desktop/tags/Q/";
    filename = filename + std::to_string(i);
    filename = filename + ".jpg"
    cv::Mat image = cv::imread(filename, 0)
    cv::Mat flat_image = image.reshape(1,1);
    trainData.push_back(flat_image);
}

cv::Mat trainLabels = (Mat_<int>(1,2376) << 1, 1, 1, 1, 1, 1, 1); // 1D matrix of labels edited  (deleted most in the list for brevity)
int C = 3; // 3 tag types
int num_components = (C-1);

cv::LDA lda(num_components); 
lda.compute(trainData, trainLabels); // compute eigenvectors
Mat projected = lda.project(trainData);

Later, when I have a new tag, this is how berak advised me to predict it's type:

cv::Mat roi; // tag to classify
cv::Mat roi_flat = roi.reshape(1,1);
Mat proj_tag = lda.project(roi_flat)

int bestId = -1;
double bestDist = 999999999.9;
for (int i=0; i<projected.rows; i++)
{
    double d = cv::norm( projected.row(i), proj_tag);
    if (bestDist < d)
    {
        bestDist = d;
        bestId = i;
    }
}
int predicted = labels.at<int>(bestId);

Does anyone have advice on how I can work out the probability here instead so I can predict which tag type it should be based on the information about the clusters, rather than the nearest value (which may be an outlier)?