Revision history [back]

unfortunately, there is no easy answer, it depends on the architecture, and what a network was trained upon. classification networks have a single layer of class predictions here, ssd style detection networks have N rows with 7 numbers, yolo3 ones have "regions" here.
position 0 is the detection id (a sequential number), position 1 the classID (unused here, because we don't have cats & dogs, only faces here)
those "blobs" are 4d tensors, and 4 dimensions don't fit into rows & cols, so those are -1, and you have to look at the size array to retrieve that information, size[0]==nImages, size[1]==numChannels, size[2]==H, size[3]==W
yes, it was originally trained on 300x300 images. if you use a smaller one, it will get upscaled automatically. note, that it might get faster (but somewhat less accurate), if you use a smaller size, like (128,96) (used in the js demo)

unfortunately, there is no easy answer, it depends on the architecture, and what a network was trained upon. classification networks have a single layer of class predictions here, ssd style detection networks have N rows with 7 numbers, yolo3 ones have "regions" here.
for those ssd detections, position 0 is the detection id (a sequential number), position 1 the classID (unused here, because we don't have cats & dogs, only faces here)
those "blobs" are 4d tensors, and 4 dimensions don't fit into rows & cols, so those are -1, and you have to look at the size array to retrieve that information, size[0]==nImages, size[1]==numChannels, size[2]==H, size[3]==W
yes, it was originally trained on 300x300 images. if you use a smaller one, it will get upscaled automatically. note, that it might get faster (but somewhat less accurate), if you use a smaller size, like (128,96) (used in the js ~~demo)~~ demo)