Ask Your Question

OpenCV Clasification Neural Network + Image Flattening Question

asked 2020-07-12 22:32:32 -0600

onthegrid007 gravatar image

There's no simple way to explain his so I will get to the point.

For a neural network, would it be better for me to flatten down a color image to between 0 and 1 like this: 0.RRRGGGBBBAAA whereas the pixel, say 0, would be .255000000255 meaning there is a red pixel at position 0, or just do HSV, thresh it, then greyscale to a grey color between 0 and 1?

Some background. I build what are called PxlDbl's or Pixel Double spelled out, and its exactly how it sounds... Its a pixels' color in a single double between 0 and well .255255255255 would be the max. Why a double? A float didn't have the precision I needed to achieve this. I currently have a custom neural network with many activation functions to be used in each topological layer. The one I am using for images is the well-known sigmoid which takes a number between 0 and 1. My end goal and the question are to place a flattened image of some kind into this Network. My question really relates to the best way to flatten it. Should I take my image size and multiply it by 4 to then have a neuron for each color channel for each pixel? I could take the value between 0 and 255 and map it between 0 and 1 which would give me effectively what I am looking for but say a 600x400 image x4 channels results in something like 960,000 neurons in the first layer which even in my case alone is truly unattainable to get to function no matter how well I programmed the Network... With a theoretical PxlDbl it shrinks the neuron could down to just the image size of 600x400 being only 240,000 neurons which is much more manageable in terms of weights and such. My only concern is with the inputs being so precise my outputs since their classifications might be much more difficult to train, ie after like .255^128076255 it might just drop off the 128076255 in the training because it just can't do that precise of numbers.

Help on this would be greatly appreciated...

edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted

answered 2020-07-13 01:50:52 -0600

berak gravatar image

updated 2020-07-13 02:14:05 -0600

your idea won't work.

though yolur PxlbleDbl are double, they don't form a "metric space"

(same problem with "hex" like 0xa0dde2f3, already -- you can't do proper maths on it, as long as you keep it in a single number)

finding a nice "embedding" for your classification task is indeed a hard problem, that's why nowadays mostly convolutional nns are used, which learn the embedding in an "end-to-end" fashion.

edit flag offensive delete link more


So the idea was essentially to save data in terms of processing and memory. Alternatively, I can scale down the original image and multiply the area times 3 ( for R, G, B ) so the first three neurons would represent the first-pixel RGB color value scaled between 0 and 1. Also with this would it be more appropriate to use HSV values instead?

onthegrid007 gravatar imageonthegrid007 ( 2020-07-13 13:00:33 -0600 )edit

forget all of it, and read up on cnns, and deep learning

berak gravatar imageberak ( 2020-07-13 13:29:01 -0600 )edit

blobFromImage(sub, blob, 1 / 255.0, cvSize(networkWidth, networkHeight), Scalar(0, 0, 0), true, false); Scaling down the pixel values to 0-1 is fine imho, the rest is not.

@berak Do i miss something - i read your comment - but here i am nearly 100% sure feature scaling is fine. If not ill listen to you and start with basics again...

holger gravatar imageholger ( 2020-07-14 08:14:54 -0600 )edit

i'm only objecting to "put a whole 4 channel pixel into a single double"

"feature scaling" is widely used, think of PCA and such.

berak gravatar imageberak ( 2020-07-15 02:54:05 -0600 )edit

Agreed - thank you for clarification :)
Putting a 4 channel pixel into a single double would destroy the spatial information.

PCA actually reduces the dimensionality by using projection. I think of PCA as a compression algorithm.

Intentional or not - you again gave the solution imho. You can apply PCA on images too! You would need to stack the pixel values into a single 1D Vector anyways for the input layer. But with PCA the input vector should be smaller, resulting in less neurons in the first layer if i understand it correctly?!

But with PCA you would still loose some minor information in the process, at the risk of loosing some accuracy and also some performance as PCA takes time, so i personally would not do it.
Have a nice week(end) - Greetings.

holger gravatar imageholger ( 2020-07-15 17:33:49 -0600 )edit

well, eigenfaces does exactly that (exploiting, that the variance between subjects increases in the projected pca space)

also, various filter banks, haralick features, census/lbph transforms, ...

still, this is all "shallow" learning, preparing features and leaning classifiers seperately.

Have a nice week(end)

hey, it's only thursday morning ;)

berak gravatar imageberak ( 2020-07-16 01:29:33 -0600 )edit

Interesting - so eigenfaces will use the downside of pca as advantage?
That is awesome.

For me - everyday is weekend - *hustles*

holger gravatar imageholger ( 2020-07-16 05:39:03 -0600 )edit

Question Tools

1 follower


Asked: 2020-07-12 22:32:32 -0600

Seen: 435 times

Last updated: Jul 13 '20