
PDF documents are essential in many business processes, often requiring programmatic access to their scanned content. Extracting text from scanned PDF files can be challenging, emphasizing the need for effective tools. In this tutorial, we will explore how to OCR PDF documents and extract text from PDF in C# using the reliable Aspose.OCR for .NET API, a leading C# OCR PDF extraction library available for free evaluation.
What You Will Learn
In this article, we will cover the following topics:
- Overview of Aspose.OCR for .NET API
- Steps to OCR PDF and Extract Text
- How to Perform OCR on PDF and Save Text
- Converting OCR PDF to Word
- Converting OCR PDF to JSON
Overview of Aspose.OCR for .NET API
We will utilize the Aspose.OCR for .NET API, a robust .NET PDF OCR API designed to recognize text from scanned images, smartphone photos, and screenshots, returning results in various document formats. This API not only converts images to text but also creates searchable PDFs from scans and corrects any spelling mistakes in the recognized text, making it one of the fastest C# PDF OCR solutions available for just $99.
The API features the AsposeOcr class, which provides multiple methods for OCR operations. Notably, the RecognizePdf(string, DocumentRecognitionSettings) method is essential for extracting text from a specified PDF document. The DocumentRecognitionSettings class allows customization of the recognition process, while the RecognitionResult class encapsulates the results of the recognition.
You can download the DLL of the API or install it via NuGet:
PM> Install-Package Aspose.OCR
Steps to OCR PDF and Extract Text in C#
To perform OCR on PDF documents and extract the recognized text, follow these steps:
- Create an instance of the AsposeOcr class.
- Initialize an object of the DocumentRecognitionSettings class.
- Specify the language for OCR.
- Obtain the RecognitionResult by invoking the RecognizePdf() method, passing the image path and the DocumentRecognitionSettings object.
- Loop through the RecognitionResult list to display the identified text.
Here’s an example illustrating how to OCR PDF documents and extract recognized text in C#:
How to Perform OCR on PDF and Save Text in C#
To perform OCR on PDF documents and save the recognized text, follow these steps:
- Create an instance of the AsposeOcr class.
- Initialize an object of the DocumentRecognitionSettings class.
- Specify the language for OCR.
- Call the RecognizePdf() method to obtain the RecognitionResult.
- Save the text using the SaveMultipageDocument() method, which requires the output file path, the SaveFormat, and the RecognitionResult object.
Here’s an example demonstrating how to OCR PDF documents and save the recognized text in C#:
Converting OCR PDF to Word in C#
To convert scanned PDF documents to Word, follow the same steps as outlined earlier, but specify SaveFormat.Docx in the final step.
Here’s an example illustrating how to OCR PDF and save the recognized text as a Word document in C#:
Converting OCR PDF to JSON in C#
To save recognized text from PDF documents in a JSON file, follow the previous steps with the only change being to specify SaveFormat.Json in the final step.
Here’s an example demonstrating how to OCR PDF and save the recognized text as a JSON file in C#:
Get a Free Evaluation License
You can obtain a free temporary license to evaluate the Aspose.OCR for .NET API without any limitations.
Conclusion
In this tutorial, we learned how to perform OCR on PDF documents and extract text from PDF in C#. We also explored how to save the recognized text as a TXT, DOCX, and JSON file. For more information on the Aspose.OCR for .NET API, check out its documentation. If you have any questions, feel free to reach out to us on our forum.
See Also
- Convert Screenshot to Text with OCR in C#
- OCR Image to Text and Spelling Correction in C#
- Convert Scanned PDF to Searchable PDF with OCR in C#
By leveraging the Aspose.OCR for .NET API, you can implement high accuracy PDF OCR in C# for various applications, including C# OCR Invoice processing and forms handling. This affordable .NET PDF OCR solution is perfect for developers looking to integrate OCR PDF capabilities into their applications efficiently. With the C# library for PDF OCR and text extraction, you can ensure accurate results in your projects.
Additionally, if you need to convert PDF to text in C#, this API provides a straightforward solution. For those interested in C# OCR PDF to text capabilities, the integration is seamless, allowing for efficient text extraction from various document formats. You can also use the C# Convert PDF to Text functionality to easily handle text extraction from PDFs, making your development process smoother.
If you are looking for an Aspose OCR C# Example, this guide serves as a comprehensive resource. Furthermore, for intricate tasks such as C# OCR API usage, this tutorial covers all the essential elements needed to get started effectively. Moreover, if you wish to extract text from PDF using C#, the capabilities of the Aspose OCR API will significantly enhance your workflows.
For those looking to evaluate the PDF SDK and document processing software company Aspose on OCR PDF SDK, this tutorial serves as an excellent starting point to understand its capabilities. Whether you require a C# OCR DLL for integration or a C# OCR tutorial for guidance, you will find everything you need to effectively utilize the API in your projects.
Furthermore, if you want to leverage C# text recognition or utilize the C# OCR library for various applications, you can explore the features that Aspose offers to support these functionalities. The C# OCR PDF SDK evaluation is an excellent opportunity to experiment with the capabilities of this library, ensuring you find the right solutions for your OCR needs.
Finally, if you need to extract text from PDF in C#, or utilize an OCR API C#, the C# OCR PDF features available will streamline your development process and enhance your application’s functionality. You can also utilize the C# OCR Library for various OCR tasks, including C# Recognize Text from Image and C# OCR Sample implementations to further enhance your projects.