Automating document processing can significantly enhance productivity and accuracy in various industries, from legal and financial services to healthcare and manufacturing. One powerful tool for this task is Aspose.OCR for .NET, which enables developers to extract text from scanned documents and images with high precision. This tutorial shows how to set up and use Aspose.OCR for .NET to automate document processing tasks, including batch OCR operations and integration with third-party systems.
Complete Example
Below is a complete example (hosted as a gist) demonstrating how to use Aspose.OCR for .NET to perform OCR on multiple images in a directory and save the extracted text to corresponding text files. This example is the source of truth for the steps that follow.
Step-by-Step Guide
Step 1: Initialize the OCR Engine
Create and configure the OCR engine. Set the desired language (English in this example).
// Step 1: Initialize the OCR Engine
using Aspose.Ocr;
using (Ocr ocrEngine = new Ocr())
{
// Set language and other configurations if needed
ocrEngine.Language = Language.English;
// (Continue with steps below inside this using block)
}
Step 2: Load Images for Processing
Define input/output directories, ensure the output folder exists, and enumerate image files.
// Step 2: Load Images for Processing
string inputDirectory = @"path\to\input\images";
string outputDirectory = @"path\to\output\text";
if (!Directory.Exists(outputDirectory))
{
Directory.CreateDirectory(outputDirectory);
}
// Get all files from the input directory (same pattern as the gist)
// TIP: to restrict to specific formats, replace "*.*" with "*.png" or "*.jpg"
string[] imageFiles = Directory.GetFiles(
inputDirectory,
"*.*",
SearchOption.TopDirectoryOnly
);
Step 3: Perform OCR on Each Image
Iterate over files and recognize text using RecognizeImage(string path)
.
// Step 3: Perform OCR on Each Image
foreach (string imageFile in imageFiles)
{
try
{
// Recognize text from the image (exactly as in the gist)
string recognizedText = ocrEngine.RecognizeImage(imageFile);
// Proceed to Step 4: save text to disk...
}
catch (Exception ex)
{
Console.WriteLine($"Error processing {imageFile}: {ex.Message}");
}
}
Step 4: Save Extracted Text to Files
Create a corresponding .txt
file for each processed image.
// Step 4: Save Extracted Text to Files
string outputFilePath = Path.Combine(
outputDirectory,
Path.GetFileNameWithoutExtension(imageFile) + ".txt"
);
File.WriteAllText(outputFilePath, recognizedText);
Console.WriteLine($"Processed: {imageFile} -> {outputFilePath}");
Tips & Tweaks
- Filter formats: Use patterns like
"*.png"
or"*.jpg"
to skip non-image files. - Recurse subfolders: Change
SearchOption.TopDirectoryOnly
toSearchOption.AllDirectories
. - Skip empty outputs: If
string.IsNullOrWhiteSpace(recognizedText)
, log and continue. - Parallel batches: Use
Parallel.ForEach(imageFiles, file => { ... })
for faster runs (mind I/O and licensing).
By following these steps you can automate batch OCR with Aspose.OCR for .NET and export clean text files for downstream processing.