Ask Your Question
1

Darknet-Yolo - extract data from Dnn.forward

asked 2018-12-20 04:55:24 -0500

Arryyyy gravatar image

updated 2018-12-20 05:40:11 -0500

Hi I`m trying to write image classifier using yolov3. All i found is examples in Python/Cpp. So my question is - how can i extract labels, bounding boxes from Dnn.forward ?

import org.opencv.core.Core;
import org.opencv.core.Mat;
import org.opencv.core.Scalar;
import org.opencv.core.Size;
import org.opencv.dnn.Dnn;
import org.opencv.dnn.Net;
import org.opencv.imgcodecs.Imgcodecs;

import java.util.ArrayList;
import java.util.List;

public class Classifier  {

    private static List<String> getOutputNames(Net net) {
        List<String> names = new ArrayList<>();

        List<Integer> outLayers = net.getUnconnectedOutLayers().toList();
        List<String> layersNames = net.getLayerNames();

        outLayers.forEach((item) -> names.add(layersNames.get(item - 1)));
        return names;
    }

    public static void main(String[] args) {

        System.loadLibrary(Core.NATIVE_LIBRARY_NAME);

        String modelWeights = "yolov3.weights";
        String modelConfiguration = "yolov3.cfg";

        Net net = Dnn.readNetFromDarknet(modelConfiguration, modelWeights);

        Imgcodecs imageCodecs = new Imgcodecs();
        Mat image = imageCodecs.imread("video/sample.png");

        Mat blob = Dnn.blobFromImage(image, 1.0, new Size(416, 416), new Scalar(0), false, false);
        net.setInput(blob);

        List<Mat> result = new ArrayList<>();
        List<String> outBlobNames = getOutputNames(net);

        net.forward(result, outBlobNames);

        outBlobNames.forEach(System.out::println);
        result.forEach(System.out::println);
    }
}

This produces

yolo_82
yolo_94
yolo_106
Mat [ 507*85*CV_32FC1, isCont=true, isSubmat=true, nativeObj=0x7f991f39a480, dataAddr=0x7f9892adf040 ]
Mat [ 2028*85*CV_32FC1, isCont=true, isSubmat=true, nativeObj=0x7f991f398b80, dataAddr=0x7f991eb20280 ]
Mat [ 8112*85*CV_32FC1, isCont=true, isSubmat=true, nativeObj=0x7f991f236880, dataAddr=0x7f98a01af040 ]

How should i proccess this data ?

edit retag flag offensive close merge delete

Comments

1

Take a look at object_detection.py and object_detection.cpp. This Java sample can be also useful: https://github.com/opencv/opencv/blob... (note that it works with SSD object detection network).

dkurt gravatar imagedkurt ( 2018-12-20 06:32:22 -0500 )edit

btw, your scale should be 0.00392, and it needs swapRB=true

berak gravatar imageberak ( 2018-12-20 07:14:15 -0500 )edit

1 answer

Sort by ยป oldest newest most voted
2

answered 2018-12-20 07:24:38 -0500

berak gravatar image

updated 2018-12-20 07:30:09 -0500

yolov3 has "region proposals", so each row in your output Mat's represents a candidate detection.

the 1st 4 numbers are [center_x, center_y, width, height], followed by (N-4) class probabilities.

first we need to collect all candidates from all outputs (scales), then we can apply NMS to retain only the most promising ones.

i only have yolov3-tiny here, so the output varies a little (it has only 2 output layers), but it should work in the same way for the "larger" one :

import org.opencv.core.Core;
import org.opencv.core.*;
import org.opencv.dnn.*;
import org.opencv.utils.*;
import org.opencv.imgcodecs.Imgcodecs;
import org.opencv.imgproc.Imgproc;

import java.util.ArrayList;
import java.util.List;

public class Yolov3 {

    private static List<String> getOutputNames(Net net) {
        List<String> names = new ArrayList<>();

        List<Integer> outLayers = net.getUnconnectedOutLayers().toList();
        List<String> layersNames = net.getLayerNames();

        outLayers.forEach((item) -> names.add(layersNames.get(item - 1)));
        return names;
    }

    public static void main(String[] args) {
        System.loadLibrary(Core.NATIVE_LIBRARY_NAME);

        String modelWeights = "c:/data/mdl/yolo/yolov3-tiny.weights";
        String modelConfiguration = "c:/data/mdl/yolo/yolov3-tiny.cfg";

        Net net = Dnn.readNetFromDarknet(modelConfiguration, modelWeights);

        Mat image = Imgcodecs.imread("dog.jpg");
        Size sz = new Size(416, 416);
        Mat blob = Dnn.blobFromImage(image, 0.00392, sz, new Scalar(0), true, false);
        net.setInput(blob);

        List<Mat> result = new ArrayList<>();
        List<String> outBlobNames = getOutputNames(net);

        net.forward(result, outBlobNames);

        outBlobNames.forEach(System.out::println);
        result.forEach(System.out::println);

        float confThreshold = 0.6f;
        List<Integer> clsIds = new ArrayList<>();
        List<Float> confs = new ArrayList<>();
        List<Rect> rects = new ArrayList<>();
        for (int i = 0; i < result.size(); ++i)
        {
            // each row is a candidate detection, the 1st 4 numbers are
            // [center_x, center_y, width, height], followed by (N-4) class probabilities
            Mat level = result.get(i);
            for (int j = 0; j < level.rows(); ++j)
            {
                Mat row = level.row(j);
                Mat scores = row.colRange(5, level.cols());
                Core.MinMaxLocResult mm = Core.minMaxLoc(scores);
                float confidence = (float)mm.maxVal;
                Point classIdPoint = mm.maxLoc;
                if (confidence > confThreshold)
                {
                    int centerX = (int)(row.get(0,0)[0] * image.cols());
                    int centerY = (int)(row.get(0,1)[0] * image.rows());
                    int width   = (int)(row.get(0,2)[0] * image.cols());
                    int height  = (int)(row.get(0,3)[0] * image.rows());
                    int left    = centerX - width  / 2;
                    int top     = centerY - height / 2;

                    clsIds.add((int)classIdPoint.x);
                    confs.add((float)confidence);
                    rects.add(new Rect(left, top, width, height));
                }
            }
        }

        // Apply non-maximum suppression procedure.
        float nmsThresh = 0.5f;
        MatOfFloat confidences = new MatOfFloat(Converters.vector_float_to_Mat(confs));
        Rect[] boxesArray = rects.toArray(new Rect[0]);
        MatOfRect boxes = new MatOfRect(boxesArray);
        MatOfInt indices = new MatOfInt();
        Dnn.NMSBoxes(boxes, confidences, confThreshold, nmsThresh, indices);

        // Draw result boxes:
        int [] ind = indices.toArray();
        for (int i = 0; i < ind.length; ++i)
        {
            int idx = ind[i];
            Rect box = boxesArray[idx];
            Imgproc.rectangle(image, box.tl(), box.br(), new Scalar(0,0,255), 2);
            System.out.println(box);
        }
        Imgcodecs.imwrite("out.png", image);
    }
}

image description

(it missed the bike, but probably better threshold values and a better model can mend that)

edit flag offensive delete link more

Comments

2

Thanks :D This works great! 1 more question - how can i get classes names, like 'dog', 'bike' etc ?

Arryyyy gravatar imageArryyyy ( 2018-12-20 07:42:09 -0500 )edit

yea, i left that out (lazy :). the classnames are here once you've parsed that, classnames[clsIDs[idx]] should be it.

berak gravatar imageberak ( 2018-12-20 07:49:05 -0500 )edit

In yolov3 we have 3 outputs layer with shape (17328,85) and 4332,85 and 1083,85

In yolo tensor paper In our experiments with COCO [10] we predict 3 boxes at each scale so the tensor is N x N x[3x (4+1+80)] for the 4 bounding box offsets, 1 objectness prediction, and 80 class predictions.

but 17328 = 76 * 76 * 3 and 4332 = 38 * 38 * 3 and 1083 = 19 * 19 *3 what is N here or what is 3 or what is 3 layers?

may be I have got a problem with my english

LBerger gravatar imageLBerger ( 2019-05-29 21:15:56 -0500 )edit

Hi Many thanks for this answer was looking for same , however when i run this code i get the following error , can you please help

yolo_16
yolo_23
Mat [ 507*85*CV_32FC1, isCont=true, isSubmat=true, nativeObj=0x2215dd30, dataAddr=0x21daf080 ]
Mat [ 2028*85*CV_32FC1, isCont=true, isSubmat=true, nativeObj=0x2215e970, dataAddr=0x220600c0 ]
Exception in thread "main" CvException [org.opencv.core.CvException: cv::Exception: OpenCV(4.1.0) C:\build\master_winpack-bindings-win64-vc14-static\opencv\modules\core\src\matrix.cpp:406: error: (-215:Assertion failed) m.dims >= 2 in function 'cv::Mat::Mat'
]
    at org.opencv.core.Mat.n_Mat(Native Method)
    at org.opencv.core.Mat.<init>(Mat.java:113)
    at org.opencv.core.MatOfFloat.<init>(MatOfFloat.java:27)
    at sample.yolo2.main(yolo2.java:115)
suddh123 gravatar imagesuddh123 ( 2019-06-22 04:17:51 -0500 )edit

Suddh, It may be because of incorrect size supplied during image preprocessing. Either a) Make sure you download the dog.jpg from https://github.com/pjreddie/darknet/t... and use the 416/416 size specified in this example or b) Provide a size that is appropriate for the image that you are using

Vivu gravatar imageVivu ( 2019-07-31 04:50:20 -0500 )edit
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2018-12-20 04:55:24 -0500

Seen: 1,096 times

Last updated: Dec 20 '18