Revision history - OpenCV Q&A Forum

Hi,

I was sifting through BMVC 2012 papers (one of the top Computer Vision conferences which was held just a few days ago) and noticed one work relevant to your problem: Hierarchical Classification for Live Fish Recognition by P. Huang, B. Boom and R. Fisher (University of Edinburgh). It was presented as a poster at the UK computer vision PhD student workshop right after BMVC on this Friday (7th September).

Unortunately, I was able to find only two partial photos of this poster ([1], [2]) but this work is a part of the much bigger Fish4Knowledge project. Their site contains a lot of useful information on video analysis of underwater video data especially the report D1.1 Fish Detection and Tracking where their algorithms are described.

Using this report and the poster you can reconstruct their approach:

First of all, fish are detected as moving objects using motion-based background/foreground segmentation (btw, there are algorithms in the OpenCV video module for this).
Detected moving blobs are filtered to exclude false positives. Several features are used for this:
- Difference of color at object boundary (colors inside and outside of a fish should be different)
- Difference of motion vectors at object boundary (motion vectors inside and outside of a fish should be different)
- Internal color homogeneity (a fish is assumed to be monochromatic due to low quality of videos)
- Internal motion homogeneity (motion vectors inside of a fish should be rather uniform)
- Internal texture homogeneity (a fish is assumed to have uniform texture due to low quality of videos)
Tracking of detected fish is based on the paper Covariance tracking using model update based on lie algebra by F. Porikli, O. Tuzel, P. Meer (CVPR 2005). A feature vector for each pixel of a detected object is computed. It consist of of the pixel coordinates, the RGB and hue values and the mean and standard deviation of the histogram of a 5×5 window which contains the target pixel. A covariance matrix of these feature vectors is computed and it used to find the tracked fish among all detected objects afterwards and so continue tracking.
Plausibility of each trajectory is computed to exclude errors from the tracking. Several features are used:
- Difference of shape ratio between frames (the ratio between width and height of a fish shouln't change much between two consecutive frames)
- Histogram difference (histograms of a fish should be similar between two consecutive frames)
- Direction smoothnes (trajectory shouldn't have sudden changes in a short term)
- Speed smoothness (speed of a fish should be similar to the average speed in its history)
- Texture difference (texture of a fish between two consecutive frames should be similar)
- Temporal persistence (fish should be present in a video more than just for a few frames)
If a fish is detected and tracked successfuly then its specie is recognized. First of all, head of the fish is found as a more smooth part than tail and the fish is rotated to a canocial position (tail on the left, head on the right, the fish is horizontally aligned) to simplify recognition. Recognition uses a lot of features describing color, texture and boundary of the fish (I can see SIFT, histograms, curvature) and it is accomplished in a hierarchical fashion. Details of this step are sparse so I guess it is better to ask Xuan Huang directly (you can see his email at the upper right corner of the poster).