Ask Your Question

Best features to track fish underwater

asked 2012-08-17 09:05:25 -0600

humbleAR gravatar image

updated 2020-11-30 00:55:07 -0600

Hello everyone,

I am currently working on a project that aims to detect fish and track them in a video. For example, draw a white box around a shark and red one around a Thuna. I have good quality pictures of the fish to track. And I'm trying to know which techniques may help me the best and provide robust tracking results on a video in acceptable response time (1 sec to detect is acceptable). There are some information that I think may be useful for you to help me :

  • Colour should be taken in consideration because some fish species have same shape with different colors;
  • Input video quality is quite good;

I would like to know which features would fit this application. I have thought for example to mix SURF and color image moments. Any ideas are welcome, if you can add examples of working similar application to your ideas, I would greatly appreciate it.

video frame :

image description

image from database: image description

Thank you all.

edit retag flag offensive close merge delete



Post some frames. It's more than helpful.

sammy gravatar imagesammy ( 2012-08-17 09:09:22 -0600 )edit

Added 2 pictures to illustrate a video frame and a picture from the database. It possible to have several pictures of the same fish if necessary. Images have good resolution and no watermark. video is at least 640x480px 25fps .

humbleAR gravatar imagehumbleAR ( 2012-08-17 09:29:35 -0600 )edit

In my opinion SURF and similar algorithms will not work well in this case because of distortions made by the water and glass and because the live fish appearance changes too much - the fish will likely be in very different positions relative to the camera and also its body is not rigid - when it swims it bends and features change. Something based on color will be good, I think. Have you considered machine learning based approaches?

Rui Marques gravatar imageRui Marques ( 2012-08-17 10:40:59 -0600 )edit

Thank you for posting Rui Marques, which approaches are you thinking about ? You mean SVM ? haar classifiers ? Neural networks ? If you could give me technique names and algorithms names and examples, it can also help me to look and read about them to decide if they are adequate to what I'm doing.

humbleAR gravatar imagehumbleAR ( 2012-08-17 10:55:42 -0600 )edit

Here is a description of two major detection frameworks I hope it will help you, but I think that finding the best approach is a good research topic in itself - it's not something you will receive here in a short answer, but something you'll discover after a few months of work :) trying more possibilities and learning on the way. Something that most probably will not work is contours-based and template matching.

sammy gravatar imagesammy ( 2012-08-17 11:03:00 -0600 )edit

What do you think about what is done here ?

humbleAR gravatar imagehumbleAR ( 2012-08-17 11:03:15 -0600 )edit

Very useful comment @sammy , I guess it's better to use texture descriptors in our case, which one you think might be better here ?

humbleAR gravatar imagehumbleAR ( 2012-08-17 11:09:44 -0600 )edit

First, in the video they say they use KLT=lucas-kanade tracker. Second: between tracking and recognition is a huge complexity difference - the first one is an hour of work. Finally, I would say that texture descriptors is a good way, but it requires a lot of work. By example, to train a decent face detector you need at least a few hundreds images, well cropped and correctly aligned.

sammy gravatar imagesammy ( 2012-08-17 11:31:50 -0600 )edit

About that video, my guess is that it is tracking the eye of the fish by detecting the eye as a semi-circular blob. Meaning, no fish recognition is done, it is just following shapes that "kind of" look like eyes. About machine learning approaches, i didn't mean any particular one, i do not have experience using ML so i do not know which one would be better for your problem.

Rui Marques gravatar imageRui Marques ( 2012-08-17 11:41:30 -0600 )edit

You mean that he tracks fish using KLT but he doesn't recognize them ? That's not very useful in our case indeed. It's quite interesting to have your opinion on that because I don't and will not have hundreds of pics of every fish. But I think differences between fish species are bigger than differences between human (same specie) faces. You think it will require how many pics approximatively for every specie ? I wanted also to ask about optical flow, do you think it can be of any use here ? for tracking maybe ?

Anyway, I really like your explanations and advices @sammy.

humbleAR gravatar imagehumbleAR ( 2012-08-17 11:41:47 -0600 )edit

2 answers

Sort by » oldest newest most voted

answered 2012-09-10 02:01:08 -0600

updated 2012-09-30 06:57:43 -0600


I was sifting through BMVC 2012 papers (one of the top Computer Vision conferences which was held just a few days ago) and noticed one work relevant to your problem: Hierarchical Classification for Live Fish Recognition by P. Huang, B. Boom and R. Fisher (University of Edinburgh). It was presented as a poster at the UK computer vision PhD student workshop right after BMVC on this Friday (7th September).

Unortunately, I was able to find only two partial photos of this poster ([1], [2]) but this work is a part of the much bigger Fish4Knowledge project. Their site contains a lot of useful information on video analysis of underwater video data especially the report D1.1 Fish Detection and Tracking where their algorithms are described.

Using this report and the poster you can reconstruct their approach:

  1. First of all, fish are detected as moving objects using motion-based background/foreground segmentation (btw, there are algorithms in the OpenCV video module for this).

  2. Detected moving blobs are filtered to exclude false positives. Several features are used for this:

    • Difference of color at object boundary (colors inside and outside of a fish should be different)
    • Difference of motion vectors at object boundary (motion vectors inside and outside of a fish should be different)
    • Internal color homogeneity (a fish is assumed to be monochromatic due to low quality of videos)
    • Internal motion homogeneity (motion vectors inside of a fish should be rather uniform)
    • Internal texture homogeneity (a fish is assumed to have uniform texture due to low quality of videos)
  3. Tracking of detected fish is based on the paper Covariance tracking using model update based on lie algebra by F. Porikli, O. Tuzel, P. Meer (CVPR 2005). A feature vector for each pixel of a detected object is computed. It consist of of the pixel coordinates, the RGB and hue values and the mean and standard deviation of the histogram of a 5×5 window which contains the target pixel. A covariance matrix of these feature vectors is computed and it used to find the tracked fish among all detected objects afterwards and so continue tracking.

  4. Plausibility of each trajectory is computed to exclude errors from the tracking. Several features are used:

    • Difference of shape ratio between frames (the ratio between width and height of a fish shouln't change much between two consecutive frames)
    • Histogram difference (histograms of a fish should be similar between two consecutive frames)
    • Direction smoothnes (trajectory shouldn't have sudden changes in a short term)
    • Speed smoothness (speed of a fish should be similar to the average speed in its history)
    • Texture difference (texture of a fish between two consecutive frames should be similar)
    • Temporal persistence (fish should be present in a video more than just for a few frames)
  5. If a fish is detected and tracked successfuly then its specie is recognized. First of all, head of the fish is found as a more smooth part than tail and the fish is rotated to a canocial position ...

edit flag offensive delete link more


Thanks a lot. Since some time passed, I have read some of the papers related to this project but I didn't read the last one that you give as reference. Thanks also for the very clear reconstruction of their approach. This is one of the most complete answers I have seen on a forum in my life.

humbleAR gravatar imagehumbleAR ( 2012-10-02 03:41:04 -0600 )edit

answered 2014-05-15 08:53:11 -0600

Y Simson gravatar image

Using back projection might work.

The basic idea is to transfer the reference fish image from rgb to HSV Calculate the HS 2D histogram of the reference fish image. Turn the histogram into a probability by normalizing it's sum to 1 Convert the query image into HSV as well For each pixel in query image find it's appropriate bin in the HS histogram. For each pixel create an image of the probabilities taken from histogram.

Here is a good explanation

edit flag offensive delete link more

Question Tools



Asked: 2012-08-17 09:05:25 -0600

Seen: 6,106 times

Last updated: May 15 '14