Automating document processing can significantly enhance productivity and accuracy in various industries, from legal and financial services to healthcare and manufacturing. One powerful tool for this task is Aspose.OCR for .NET, which enables developers to extract text from scanned documents and images with high precision. This tutorial shows how to set up and use Aspose.OCR for .NET to automate document processing tasks, including batch OCR operations and integration with third-party systems.

Complete Example

Below is a complete example (hosted as a gist) demonstrating how to use Aspose.OCR for .NET to perform OCR on multiple images in a directory and save the extracted text to corresponding text files. This example is the source of truth for the steps that follow.


Step-by-Step Guide

Step 1: Initialize the OCR Engine

Create and configure the OCR engine. Set the desired language (English in this example).

// Step 1: Initialize the OCR Engine
using Aspose.Ocr;

using (Ocr ocrEngine = new Ocr())
{
    // Set language and other configurations if needed
    ocrEngine.Language = Language.English;

    // (Continue with steps below inside this using block)
}

Step 2: Load Images for Processing

Define input/output directories, ensure the output folder exists, and enumerate image files.

// Step 2: Load Images for Processing
string inputDirectory = @"path\to\input\images";
string outputDirectory = @"path\to\output\text";

if (!Directory.Exists(outputDirectory))
{
    Directory.CreateDirectory(outputDirectory);
}

// Get all files from the input directory (same pattern as the gist)
// TIP: to restrict to specific formats, replace "*.*" with "*.png" or "*.jpg"
string[] imageFiles = Directory.GetFiles(
    inputDirectory,
    "*.*",
    SearchOption.TopDirectoryOnly
);

Step 3: Perform OCR on Each Image

Iterate over files and recognize text using RecognizeImage(string path).

// Step 3: Perform OCR on Each Image
foreach (string imageFile in imageFiles)
{
    try
    {
        // Recognize text from the image (exactly as in the gist)
        string recognizedText = ocrEngine.RecognizeImage(imageFile);

        // Proceed to Step 4: save text to disk...
    }
    catch (Exception ex)
    {
        Console.WriteLine($"Error processing {imageFile}: {ex.Message}");
    }
}

Step 4: Save Extracted Text to Files

Create a corresponding .txt file for each processed image.

// Step 4: Save Extracted Text to Files
string outputFilePath = Path.Combine(
    outputDirectory,
    Path.GetFileNameWithoutExtension(imageFile) + ".txt"
);

File.WriteAllText(outputFilePath, recognizedText);

Console.WriteLine($"Processed: {imageFile} -> {outputFilePath}");

Tips & Tweaks

  • Filter formats: Use patterns like "*.png" or "*.jpg" to skip non-image files.
  • Recurse subfolders: Change SearchOption.TopDirectoryOnly to SearchOption.AllDirectories.
  • Skip empty outputs: If string.IsNullOrWhiteSpace(recognizedText), log and continue.
  • Parallel batches: Use Parallel.ForEach(imageFiles, file => { ... }) for faster runs (mind I/O and licensing).

By following these steps you can automate batch OCR with Aspose.OCR for .NET and export clean text files for downstream processing.

More in this category