Ask Your Question
0

Having trouble using VGG16 to detect objects in a video

asked 2018-01-06 00:12:16 -0600

Quo gravatar image

updated 2018-01-06 01:01:39 -0600

Hello all,

Following this tutorial: https://www.pyimagesearch.com/2017/08... - we can see the author is using the MobilennetSSD network architecture. I've found this network to be very hit and miss, and won't capture the most basic objects from a car dash cam (other cars, trucks etc.).

I have altered the code to include the VGG16 network, but I can't seem to get it working, and I feel it has something to do with the way I'm parsing the image as a blob with cv2.dnn.blobFromImage() method. I'm resizing my image to the required 224 x 224 format, but it's still producing an error.

The goal of my code at the moment is to parse an image as a blob and print out the return value of net.forward().

import imutils
import numpy as np
import argparse
import cv2
import time

ap = argparse.ArgumentParser()
ap.add_argument('-v', '--video', required=True,
    help='path to input video')
ap.add_argument('-p', '--prototxt', required=True,
    help='path to Caffe deploy prototxt file')
ap.add_argument('-m', '--model', required=True,
    help='path to Caffe pre-trained model')
ap.add_argument('-l', '--labels', required=True,
    help='path to ImageNet labels')
ap.add_argument('-c', '--confidence', type=float, default=0.5,
    help='minimum probability to filter weak detections')
args = vars(ap.parse_args())

rows = open(args['labels']).read().strip().split('\n')
classes = [r[r.find(" ") + 1:].split(",")[0] for r in rows]

COLORS = np.random.uniform(0, 255, size=(len(classes), 3))

net = cv2.dnn.readNetFromCaffe(args['prototxt'], args['model'])

video = cv2.VideoCapture(args['video'])
while True:
    (grabbed, frame) = video.read()

    if not grabbed:
        break

    frame = imutils.resize(frame, width=500)
    frameClone = frame.copy()

    (h, w) = frame.shape[:2]
    blob = cv2.dnn.blobFromImage(frame, 1, (224, 224), (104, 117, 123))

    net.setInput(blob)
            ##### the error is produced on the line below - see the bottom for the error output
    detections = net.forward()
    print(detections)
    wait = input()
    idxs = np.argsort(detections[0])[::-1][:5]

    for (i, idx) in enumerate(idxs):
        if i == 0:
            label = "Label: {}, {:.2f}%".format(classes[idx],
                detections[0][idx] * 100)
            cv2.putText(frameClone, label, (5, 25), cv2.FONT_HERSHEY_SIMPLEX,
                0.7, (0, 0, 255), 2)

    cv2.imshow('Video', frameClone)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

video.release()
cv2.destroyAllWindows()

As per above, the code breaks at detections = net.forward(). The lengthy error is displayed as the following:

PS C:\Users\Quo\OneDrive\Projects\Python\Computer Vision\Work\Video> python .\deep_learning_object_detection
_video.py -v .\videos\1522-front.mp4 -p .\prototxt\VGG_ILSVRC_16_layers_deploy.prototxt.txt -m C:\Users\Quo\
Downloads\VGG16_SOD_finetune.caffemodel -l .\labels\synset_words.txt
[libprotobuf WARNING D:\Build\OpenCV\opencv-3.3.1\3rdparty\protobuf\src\google\protobuf\io\coded_stream.cc:605] 
Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing 
will be halted for security reasons.  To increase the limit (or to disable these warnings), see 
CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING D:\Build\OpenCV\opencv-3.3.1\3rdparty\protobuf\src ...
(more)
edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted
0

answered 2018-01-08 07:26:32 -0600

dkurt gravatar image

Hi, @Quo!

I've found this network to be very hit and miss, and won't capture the most basic objects from a car dash cam (other cars, trucks etc.).

It's a normal case for an every trainable algorithm. It depends on data in the training dataset. In example, if it was trained to detect cars from front face it's hard to predict a performance on images with side faced cars.

I have altered the code to include the VGG16 network, but I can't seem to get it working...

Originally, VGG16 model is an image classification network but not for objects detection. Please clarify what kind of model is used.

edit flag offensive delete link more

Question Tools

1 follower

Stats

Asked: 2018-01-06 00:12:16 -0600

Seen: 599 times

Last updated: Jan 08 '18