The answer is simple, yes it is possible.
- First detect persons in images. This can be done with tons of techniques but I suggest the deep learning ones like SSD or YOLO.
- Then for each detection you will need a descriptor on that.
- Either use one descriptor as reference and find all matches in your other images.
- OR use a clustering algorithm and define how many people will appear (fixed clusters) or set some decision on the distance of cluster matching.
Nevertheless, a first step will be for you to start googling and read up on the topic :)