Any future plan to add an audio input function (similar to blobFromImage ) to the DNN module?

asked 2018-11-12 12:24:12 -0500

kkudryavtsev gravatar image

updated 2018-11-12 12:42:53 -0500

berak gravatar image

So that would allow to run some TensorFlow models (like DeepSpeech project) for sound recognition?

edit retag flag offensive close merge delete


i can't speak for the devs here, but it sounds highly unlikely to happen.

opencv is still a computer-vision library, and the tensorflow audio api is very complex, containing means to load files, calculate MEL coefficients, time-stretching, and various other processing.

related question

berak gravatar imageberak ( 2018-11-13 01:40:36 -0500 )edit

Understood, thank you! I just thought that OpenCV::DNN is one of the best libraries in terms of speed and simplicity of use. It also supports already all types of layers needed for DeepSpeech, even though they have quite complex overall algorithm. So that the input function would kind of stimulate using the library outside of the vision field.

kkudryavtsev gravatar imagekkudryavtsev ( 2018-11-13 10:46:55 -0500 )edit