1 | initial version |
unfortunately, there is no easy answer, it depends on the architecture, and what a network was trained upon. classification networks have a single layer of class predictions here, ssd style detection networks have N rows with 7 numbers, yolo3 ones have "regions" here.
position 0 is the detection id (a sequential number), position 1 the classID (unused here, because we don't have cats & dogs, only faces here)
those "blobs" are 4d tensors, and 4 dimensions don't fit into rows & cols, so those are -1, and you have to look at the size
array to retrieve that information, size[0]==nImages, size[1]==numChannels, size[2]==H, size[3]==W
yes, it was originally trained on 300x300 images. if you use a smaller one, it will get upscaled automatically. note, that it might get faster (but somewhat less accurate), if you use a smaller size, like (128,96) (used in the js demo)
2 | No.2 Revision |
unfortunately, there is no easy answer, it depends on the architecture, and what a network was trained upon. classification networks have a single layer of class predictions here, ssd style detection networks have N rows with 7 numbers, yolo3 ones have "regions" here.
for those ssd detections, position 0 is the detection id (a sequential number), position 1 the classID (unused here, because we don't have cats & dogs, only faces here)
those "blobs" are 4d tensors, and 4 dimensions don't fit into rows & cols, so those are -1, and you have to look at the size
array to retrieve that information, size[0]==nImages, size[1]==numChannels, size[2]==H, size[3]==W
yes, it was originally trained on 300x300 images. if you use a smaller one, it will get upscaled automatically.
note, that it might get faster (but somewhat less accurate), if you use a smaller size, like (128,96) (used in the js demo) demo)