Ask Your Question

Revision history [back]

CNNs are trully remarkable. It is an established fact that they are solving problems that years of previous academic and industrial research could not.

Previously, object classification and segmentation would rely on carefully hand-designed feature maps followed by some decision making process (knns, svms, etc), and a detection system would be as more robust as the feature map was better designed. HOG, SWIFT, Haar, are examples of this.

However, as more and more computational power was becoming available, GPU processing potential was starting to be taken advantadge of and more and more labelled data was available for designing systems, researchers start training a bunch of sequencial convolutional filters (what now people call deep neural networks) to achieve the the best possible feature map to solve a specific problem. And what they realized was that no hand-designed feature map would ever be as good as letting the computer train it.

This was an important advancement in computer vision, however, in my opinion it creates a new problem. Since this strategy is completely dependent on having huge amounts of labelled data, as well as heaps of computational power, what we are observing is that the monopoly of computer vision research is owned by big companies, such as google, facebook, amazon and a few others. And since they can solve their CV problems with this strategy, I don't see any motivation for them to advance CV. Why would they? If they can sell their services, and if the best algorithms are dependent on their data, what is their motivation to allow CV to move to a new and even better era?

For me, I can say that I am a bit disappointed in the direction everything took. Doing CV was the ability to give eyes to a computer, and this was done by trying to figure out what visual features in an object could be used to differentiate them appart from a scene, and find ways to treat an image so that these features would be enhanced. This is over, since no human being can beat a computer in figuring out what is the best way to treat an image in order to solve a classification problem. Now, doing CV is downloading an existing CNN model and doing some tweaks to adapt it to your problem. Its interesting to see that most of recent research papers work on existing CNN models, and expand from there. And no one really understands what is happening there, its trial and error strategy.

To answer your question, if you can find an existing model that works for your problem, you should use it.

CNNs are trully remarkable. It is an established fact that they are solving problems that years of previous academic and industrial research could not.

Previously, object classification and segmentation would rely on carefully hand-designed feature maps followed by some decision making process (knns, svms, etc), and a detection system would be as more robust as the feature map was better designed. HOG, SWIFT, Haar, are examples of this.

However, as more and more computational power was becoming available, GPU processing potential was starting to be taken advantadge of and more and more labelled data was available for designing systems, researchers start started training a bunch of sequencial convolutional filters (what now people call deep neural networks) to achieve the the best possible feature map to solve a specific problem. And what they realized was that no hand-designed feature map would ever be as good as letting the computer train it.

This was an important advancement in computer vision, however, in my opinion it creates a new problem. Since this strategy is completely dependent on having huge amounts of labelled data, as well as heaps of computational power, what we are observing is that the monopoly of computer vision research is owned by big companies, such as google, facebook, amazon and a few others. And since they can solve their CV problems with this strategy, I don't see any motivation for them to advance CV. Why would they? If they can sell their services, and if the best algorithms are dependent on their data, what is their motivation to allow CV to move to a new and even better era?

For me, I can say that I am a bit disappointed in the direction everything took. Doing CV was the ability to give eyes to a computer, and this was done by trying to figure out what visual features in an object could be used to differentiate them appart from a scene, and find ways to treat an image so that these features would be enhanced. This is over, since no human being can beat a computer in figuring out what is the best way to treat an image in order to solve a classification problem. Now, doing CV is downloading an existing CNN model and doing some tweaks to adapt it to your problem. Its interesting to see that most of recent research papers work on existing CNN models, and expand from there. And no one really understands what is happening there, its trial and error strategy.

To answer your question, if you can find an existing model that works for your problem, you should use it.

it.