HELP GRANTED : C# OCR (Optical Character Recognition)

C# OCR (Optical Character Recognition)

OCR as the title say stands for: Optical Character Recognition, the ability to extract characters as they appear in an image.

We will be using the MODI Type library, it's a COM Interop.

The MODI library is available within The Microsoft Office suites (2003 to 2007), Unfortunately it is not available in the 2010 version.

Include the MODI Type library (COM Interop) and convert image(s) to text like this:

 
using MODI;
using System;
 
class Program
{
    static void Main(string[] args)
    {
        DocumentClass myDoc = new DocumentClass();
        myDoc.Create(@"theDocumentName.tiff"); //we work with the .tiff extension
        myDoc.OCR(MiLANGUAGES.miLANG_ENGLISH, true, true);
 
        foreach (Image anImage in myDoc.Images)
        {
            Console.WriteLine(anImage.Layout.Text); //here we cout to the console.
        }
    }
}

Leave me a comment if you need help with it.

6 comments:

Subi10/15/12, 2:34 PM
Modi is no longer packaged with MS Office starting with MS office 2010. what is the alternative solution for now?
ReplyDelete
Replies
Anonymous11/5/12, 12:15 PM
@subi, Office 2010 has MODI http://support.microsoft.com/kb/982760. Don't lie..
ReplyDelete
Replies
Unknown2/5/14, 7:27 AM
I want to convert images to text of an Arabic language. MODI can't convert this. Is there any source without any third party tool.
ReplyDelete
Replies
Unknown7/10/14, 11:02 AM
How to use an URL as image source??
Please Help!!
ReplyDelete
Replies
Ahmed10/14/22, 3:51 AM
I really enjoyed reading your post. Well-written and insightful articles like yours are well worth my time. Thanks for your efforts. The scanned images of both Arabic and English documents can now be converted into fully searchable and editable text files. The accuracy and reliability of RDI's OCR engine, as well as our character recognition software, make this possible.
ReplyDelete
Replies

Add comment

C# OCR (Optical Character Recognition)

6 comments:

2 ads