Ask Your Question

Birdseye transform with getPerspectiveTransform

asked 2020-12-01 02:18:53 -0500

jonas_he gravatar image

updated 2020-12-01 07:05:50 -0500

supra56 gravatar image

Hello I am currently working on creating a python software that tracks players on a soccer field. I got the player detection working with YoloV3 and was able to output quite a nice result with players centroids and boxes drawn. What i want to do now is translate the players position and project their centroids onto a png/jpg of a soccerfield. For this I inteded to use two arrays with refrence points one for the soccerfield-image and one for the source video. But my question now is how do I translate the coordinates of the centroids to the soccerfield image.

Similiar example: image description

How the boxes and Markers are drawn:

def draw_labels_and_boxes(img, boxes, confidences, classids, idxs, colors, labels):
# If there are any detections
if len(idxs) > 0:
    for i in idxs.flatten():
        # Get the bounding box coordinates
        x, y = boxes[i][0], boxes[i][1]
        w, h = boxes[i][2], boxes[i][3]

        # Draw the bounding box rectangle and label on the image
        cv.rectangle(img, (x, y), (x + w, y + h), (255, 255, 255), 2)
        cv.drawMarker (img, (int(x + w / 2), int(y + h / 2)), (x, y), 0, 20, 3)
return img

Boxes generated like this:

def generate_boxes_confidences_classids(outs, height, width, tconf):
    boxes = []
    confidences = []
    classids = []

    for out in outs:
        for detection in out:
            # print (detection)
            # a = input('GO!')

            # Get the scores, classid, and the confidence of the prediction
            scores = detection[5:]
            classid = np.argmax(scores)
            confidence = scores[classid]

            # Consider only the predictions that are above a certain confidence level
            if confidence > tconf:
                # TODO Check detection
                box = detection[0:4] * np.array([width, height, width, height])
                centerX, centerY, bwidth, bheight = box.astype('int')

                # Using the center x, y coordinates to derive the top
                # and the left corner of the bounding box
                x = int(centerX - (bwidth / 2))
                y = int(centerY - (bheight / 2))

                # Append to list
                boxes.append([x, y, int(bwidth), int(bheight)])

    return boxes, confidences, classids
edit retag flag offensive close merge delete


@jonas_he. U can't do video streaming on warpPerspective. It is called PitchBrain(the one on bottom right side screenshot). It is using by Deep Learning OpenCV. U don't needed Yolo3 and no rectangle and centroids too. U needed to do 2 colours one white and one black jersey.

supra56 gravatar imagesupra56 ( 2020-12-01 13:11:36 -0500 )edit

How did u get PitchBrain?

supra56 gravatar imagesupra56 ( 2020-12-02 08:13:48 -0500 )edit

2 answers

Sort by » oldest newest most voted

answered 2020-12-01 11:24:36 -0500

kbarni gravatar image

But my question now is how do I translate the coordinates of the centroids to the soccerfield image.

DO NOT translate the rectangle centroids! The position in the field of each player is determined by the position of their feet, not their hip. So the position of a player in the field should be (centerX, centerY+bheight/2).

Then just multiply this vector* with the homography matrix** and voilà! You have projected the player to the 2D field.

  • oh, btw, you need to add a 1 as the third element to the vector (so it becomes X,Y,1)

** To get the homography matrix, you need at least 4 corresponding points from the two images (the more the better). Then call getPerspectiveTransformto compute the homography matrix

edit flag offensive delete link more

answered 2020-12-01 11:06:35 -0500

crackwitz gravatar image

your code doesn't contain getPerspectiveTransform. do you have the homography matrix returned by that call?

the homography matrix lets you translate any point in one image to the corresponding point in the other... but if you invert it, you can do the opposite!

you may or may not need np.linalg.inv to invert the matrix, and you need to matrix-multiply ( or @-operator between operands) your point in a specific format with the matrix.

when you have a point (x,y), you need to construct a vector v = np.matrix([x, y, 1]).T, and then you can say v_mapped = M @ v; v_mapped /= v_mapped[2]. now you have the point (x', y') in v_mapped[0:2]

I believe OpenCV's perspectiveTransform()procedure does this for you:

in the numpy case you may have to reshape the src argument suitably. if you have your points as an array of row vectors [[x,y], [x,y], ...], then you may have to reshape it to be (-1, 1, 2) and pass that

edit flag offensive delete link more

Question Tools

1 follower


Asked: 2020-12-01 02:18:53 -0500

Seen: 80 times

Last updated: Dec 01 '20