Understanding the use of EM

I am trying to use EM to detect objects in a picture using python. My first problem is that the official documentation is not very descriptive of what the input should be.

This is my code so far:

import numpy as np
import cv2

#Image read
sample = cv2.imread(imgpath)[:,:,1]
image_rows = np.size(sample,0)
image_cols = np.size(sample,1)
#flatten sample
sample_flat = sample.flatten()*(1.0 / 255.0)

#EM training
em = cv2.EM(n_clusters)

log_likelihoods = np.zeros(shape=(image_rows * image_cols,1))
labels = np.zeros(shape=(image_rows * image_cols,1))
probs = np.zeros(shape=(image_rows * image_cols, n_clusters))

trained = em.train(sample_flat, log_likelihoods, labels, probs)
print('Trained? ->'+ str(trained[0]))

I have seen exactly what I want to do but in C++ Here. The example seems to send a matrix with the picture x, y combinations and no image value for the tuple. This is confusing for me since I'm starting.

Can anybody please tell me what is the exact input for the training. Some good documentation would be much appreciated as well.

Thanks in advance.

