What are some good resources for arabic OCR in the wild dataset?

arabic
OCR

asked 2016-05-17 02:11:03 -0600

Hello there, I've recently started working on a OCR in the wild algorythm using neural networks. My requirements are as follow: Arabic text, Natural images(not scans etc.)

My goal is detecting weather the image has text or not and then extract the text.

I need some help from you, I need large dataset. If there's any, it would be great, otherwise, I would appreciate some help thinking of reasonable methods to create such dataset by my own.

Thank you very much, A Dylan

edit retag flag offensive close merge delete

Comments

On Ubuntu, if you hit sudo apt-get install tesseract-ocr and then hit tab, you can see a range of available language models for tesseract OCR system.

StevenPuttemans ( 2016-05-17 09:15:02 -0600 )edit

Hey, I try it , i got some error when running the following command: tesseract photo.jpeg out -l ara (I installed the language package) The error is:

Tesseract Open Source OCR Engine v3.04.00 with Leptonica
Cube ERROR (CubeRecoContext::Load): unable to read cube language model params from /opt/local/share/tessdata/ara.cube.lm
Cube ERROR (CubeRecoContext::Create): unable to init CubeRecoContext object
init_cube_objects(false, &tessdata_manager):Error:Assert failed:in file tessedit.cpp, line 205

adamdylan ( 2016-05-18 03:09:22 -0600 )edit

I guess you will need to address this as an issue at the tesseract github, to get better support!

StevenPuttemans ( 2016-05-18 03:21:29 -0600 )edit

add a comment

What are some good resources for arabic OCR in the wild dataset?

Comments

Links

Question Tools

Stats

Related questions

What are some good resources for arabic OCR in the wild dataset? edit

Comments

Links

Question Tools

Stats

Related questions

What are some good resources for arabic OCR in the wild dataset?