# Expectation Maximization Prediction Issues

I was trying to segment a leaf from the background to be able to identify the type of disease that the leaf contains. First, I begin to work with a small dataset taken in a controlled environment, after I found out a new dataset which contains more images, but is not so controlled.

Controlled Environment

New Dataset Img Example

However, the code works very well for the first one, it is not good for the second as you can see below:

Results

def train_em(samples, n_clusters):
print('[Start training EM]')

em = cv2.ml.EM_create()
em.setClustersNumber(n_clusters)
em.setCovarianceMatrixType(cv2.ml.EM_COV_MAT_DIAGONAL)
em.trainEM(samples)

print('[Done training EM]')
return em

def predict(em, image):
pixelDict = {}
height, width, _ = image.shape
for h in range(0, height):
for w in range(0, width):
key = hash((image[h,w,0], image[h,w,1]))
if key in pixelDict:
else:
_, probs = em.predict(np.float32([[image[h,w,0], image[h,w,1]]]))
if probs[0][0] >= probs[0][1]:
pixelDict[key] = 1
else:
pixelDict[key] = 0


Could someone give me a direction?

edit retag close merge delete

Sort by ยป oldest newest most voted

Expectation maximization - as most machine learning methods - learn to make decisions from the training data. So unfortunately they won't work on other types of data.

Your model learns that the RGB color of a healthy leaf is something like 140/160/80 +/-10 (I'm simplifying). On the second image, the color is around 50/80/60. So it won't fit by far on the learned model.

There are two solutions:

• Use the same conditions for learning and predicting. In natural conditions this is quite hard, as conditions change often and the leaves have a great variability.
• Try to find other descriptors for training/prediction that are invariant to the conditions and which discriminate well the healthy and sick part of the leaf. Texture is a good starting point (Haralick descriptors, Gabor filters, tensors, wavelets, etc...)
more

@kbarni Well, I'm not sure about it as I'm retraining it every time. Also, shadows are an issue here.

( 2018-12-09 16:26:12 -0500 )edit

As I said, color (esp. RGB color space) varies a lot with different conditions.

You need a more robust descriptor, and texture is a good candidate.

You can also try other color spaces as Lab or HSV (just add a cvtColor at the beginning of your algorithm). It's not as robust as texture, but better than RGB.

( 2018-12-10 03:25:58 -0500 )edit

Kbarni, did you face shadows on your paper? I'm trying to find a way to avoid them

( 2018-12-10 16:51:35 -0500 )edit

No. Just used lighting invariant model. It really works on images taken in non-controlled conditions (in field); we didn't make any laboratory tests.

( 2018-12-11 11:48:35 -0500 )edit

Lightning invariant model? Do you have anything about it?

( 2018-12-12 18:04:39 -0500 )edit

Please, don't make me repeat myself again!!!!! Texture descriptors (see above) are quite robust to varying light conditions; for the color use the a and *b channels from Lab color space (as it separates the luminosity and color information).

( 2018-12-13 09:53:51 -0500 )edit

I'm not using RGB, I was already using just S and V form HSV to train it. I will take a look on Lab color! Thank you

( 2018-12-13 10:03:11 -0500 )edit

Did you published the paper?

( 2020-02-11 11:44:38 -0500 )edit

Official site

GitHub

Wiki

Documentation