Ask Your Question
1

Using HOG features to update bag of words background model is too computationally intensive

asked 2017-07-31 22:48:20 -0600

benweinstein gravatar image

I am trying to build a visual bag of words model using HOG descriptors. This is part of patch-level background model for foreground-background selection in cluttered video. I am following this paper here. The paper describes patches of 32 * 32, but is vague as to the rest of the hog properties (see below).

I am using opencv 3.2 binary on windows, python 3.5.

I first create a BOW class with a dictionary size 128:

self.BOW=cv2.BOWKMeansTrainer(128)

I'll be using HOG descriptors with the following parameters (but see below).

#HOG descriptor
winSize = (32,32)
blockSize = (16,16)
blockStride = (8,8)
cellSize = (8,8)
nbins = 9
derivAperture = 1
winSigma = 4.
histogramNormType = 0
L2HysThreshold = 2.0000000000000001e-01
gammaCorrection = 0
nlevels = 32
self.calc_HOG = cv2.HOGDescriptor(winSize,blockSize,blockStride,cellSize,nbins,derivAperture,winSigma,
                        histogramNormType,L2HysThreshold,gammaCorrection,nlevels)

I am using MOG background subtraction to generate my initial background proposal, which I then feed into the BOW model for patch-level features.

An image has the following shape

self.bg_image.shape
(509, 905, 3)

So for each frame, if its been classified as background by MOG I add the HOG feature to the BOW model

self.BOW.add(self.calc_HOG.compute(self.bg_image))

and I'll go along like that until a potential foreground object needs to be checked.

When a frame is potentially foreground, I cluster the BOW descriptors

self.background_vocab=self.BOW.cluster()

Generate an extractor

 self.extract_bow = cv2.BOWImgDescriptorExtractor(self.calc_HOG, self.matcher)

create a vocabulary

  self.extract_bow.setVocabulary(self.background_vocab)

and compare the histogram of the current crop to the corresponding part of the background image using the HOG-clustered visual words.

  current_BOWhist=self.extract_bow.compute(current)

  print("Extract background HOG feature")
  background_BOWhist=self.extract_bow.compute(background)

  print("Compare Background and Foreground Masks")
  BOW_dist =cv2.compareHist(current_BOWhist, background_BOWhist, method=cv2.HISTCMP_CHISQR)

The memory use/performance of this often cited strategy seems completely unusable. the BOW.cluster() method, documented here is incredibly memory intensive and basically locks up the computer. Even on just a few descriptors it takes 10 to 20 seconds per call. This is especially perplexing because the above paper, and many similar papers, specify that the background model is continuously updated, that is, many calls of cluster() and generating new a vocabulary every time a frame is classified as background, its hog features are added.

So in my example file, I go through 30 frames of adding HOG features

a=self.BOW.getDescriptors()
len(a)
30

But this has just a huge number of descriptors.

self.BOW.descriptorsCount()
576979200

I think is the problem, this seems like way too many points for kmeans to compute. Is there something wrong with my HOG descriptor properties? From the cited paper above

"The BoW dictionary size is 128. During the verification step, the MSDPs are resized according to their aspect ratios to make sure all patches contain roughly the same amount of pixels. In order to tackle with the various aspect ratios, which is inevitable during segmentation, we let ... (more)

edit retag flag offensive close merge delete

Comments

1

amazing, that you got that far, even. opencv's Bow functions are meant to be used with feature2d extractors/descriptors, not at all with HOG.

it's an absolute mystery to me, why this did not crash (LOUD):

self.extract_bow = cv2.BOWImgDescriptorExtractor(self.calc_HOG, self.matcher)
berak gravatar imageberak ( 2017-08-01 00:17:55 -0600 )edit
1

problem 1 might be here:

self.BOW.add(self.calc_HOG.compute(self.bg_image))

that's the (509, 905, 3) image ? (not a 32x32 grayscale patch ?)

if so, it will generate insanely long features (also in the wrong shape)

what's the shape of self.background_vocab ?

berak gravatar imageberak ( 2017-08-01 00:39:52 -0600 )edit

Thanks @berak, would it be best for me to just make my own BOW routine for a hog feature? And yes you've hit on an ongoing question I have. The authors refer to 32 * 32 "patches", whether those "patches" are actually the "windows" in the HOG feature, or if you first cut the images into patches and perform hog calculations on each patch.

benweinstein gravatar imagebenweinstein ( 2017-08-01 11:02:18 -0600 )edit

1 answer

Sort by ยป oldest newest most voted
0

answered 2017-08-01 23:14:17 -0600

berak gravatar image

updated 2017-08-02 00:35:34 -0600

would it be best for me to just make my own BOW routine for a hog feature ?

i'm afraid, you have to. a HOGDescriptor is not a FeatureExtractor.

The authors refer to 32 * 32 "patches",

yes, you have to sample your image with a grid of patches, then compute a hog descriptor per patch. (those are actually column vectors, so you'll have to reshape() them to a row, and stack them hoizontally, so with 10x10 patches, you should have a 100 rows, 1760 (sorry, forgot the correct number) columns Mat per image. for 30 images, that's [1760x3000]. then you can throw that into kmeans, and reduce it to 128 rows and 1760 cols again, that's your vocabulary.

edit flag offensive delete link more

Question Tools

1 follower

Stats

Asked: 2017-07-31 22:48:20 -0600

Seen: 1,064 times

Last updated: Aug 02 '17