C# OCR (Optical Character Recognition)

OCR as the title say stands for: Optical Character Recognition, the ability to extract characters as they appear in an image.


We will be using the MODI Type library, it's a COM Interop.


The MODI library is available within The Microsoft Office suites (2003 to 2007), Unfortunately it is not available in the 2010 version.




Include the MODI Type library (COM Interop) and convert image(s) to text like this:
 
using MODI;
using System;
 
class Program
{
    static void Main(string[] args)
    {
        DocumentClass myDoc = new DocumentClass();
        myDoc.Create(@"theDocumentName.tiff"); //we work with the .tiff extension
        myDoc.OCR(MiLANGUAGES.miLANG_ENGLISH, true, true);
 
        foreach (Image anImage in myDoc.Images)
        {
            Console.WriteLine(anImage.Layout.Text); //here we cout to the console.
        }
    }
}






Leave me a comment if you need help with it.

2 ads