Home > Code > C# > How to Perform OCR Operations on PDF Documents inside .NET Applications

How to Perform OCR Operations on PDF Documents inside .NET Applications

by sher azam   on Nov 25, 2015   Category: C#   |  Views: 541    |  Points: 25   |  Gold 


This technical tip shows how .NET developers can perform OCR operations on PDF documents inside .NET Applications. Aspose.OCR APIs can only accept images to perform OCR operation on them. If the requirement is to perform OCR on PDF documents then two Aspose APIs will be used to achieve the ultimate goal, that is; Aspose.Pdf APIs convert the PDF pages to images and Aspose.OCR APIs perform the OCR operation on the extracted/converted images. This article demonstrates the usage of Aspose.Pdf for .NET & Aspose.OCR for .NET to perform the OCR operation on PDF documents.

///The sample code below shows how to perform OCR operations on PDF documents

//[C# Code Sample]


//Create an instance of Document to load the PDF
var pdfDocument = new Aspose.Pdf.Document("D:/sample.pdf");

//Create an instance of OcrEngine for recognition
var ocrEngine = new Aspose.OCR.OcrEngine();

//Iterate over the pages of PDF
for (int pageCount = 1; pageCount <= pdfDocument.Pages.Count; pageCount++)
{
//Creating a MemoryStream to hold the image temporarily
using (var imageStream = new System.IO.MemoryStream())
{
//Create Resolution object with DPI value
var resolution = new Aspose.Pdf.Devices.Resolution(300);

//Create JPEG device with specified attributes (Width, Height, Resolution, Quality)
//where Quality [0-100], 100 is Maximum
var jpegDevice = new Aspose.Pdf.Devices.JpegDevice(resolution, 100);

//Convert a particular page and save the image to stream
jpegDevice.Process(pdfDocument.Pages[pageCount], imageStream);

imageStream.Position = 0;

//Set Image property of OcrEngine to the stream obtained from previous step
ocrEngine.Image = Aspose.OCR.ImageStream.FromStream(imageStream, Aspose.OCR.ImageStreamFormat.Jpg);

//Perform OCR operation on one page at a time
if (ocrEngine.Process())
{
Console.WriteLine(ocrEngine.Text);
}

}
}



//[VB.NET Code Sample]
 

'Create an instance of Document to load the PDF
Dim pdfDocument = New Aspose.Pdf.Document("D:/Disclosure(SDK).pdf")

'Create an instance of OcrEngine for recoginition
Dim ocrEngine = New Aspose.OCR.OcrEngine()

'Iterate over the pages of PDF
For pageCount As Integer = 1 To pdfDocument.Pages.Count
'Creating a MemoryStream to hold the image temporarily
Using imageStream = New System.IO.MemoryStream()
'Create Resolution object with DPI value
Dim resolution = New Aspose.Pdf.Devices.Resolution(300)

'Create JPEG device with specified attributes (Width, Height, Resolution, Quality)
'where Quality [0-100], 100 is Maximum
Dim jpegDevice = New Aspose.Pdf.Devices.JpegDevice(resolution, 100)

'Convert a particular page and save the image to stream
jpegDevice.Process(pdfDocument.Pages(pageCount), imageStream)

imageStream.Position = 0

'Set Image property of OcrEngine to the stream obtained from previous step
ocrEngine.Image = Aspose.OCR.ImageStream.FromStream(imageStream, Aspose.OCR.ImageStreamFormat.Jpg)

'Perform OCR operation on one page at a time
If ocrEngine.Process() Then
Console.WriteLine(ocrEngine.Text)
End If

End Using
Next pageCount



Overview: Aspose.OCR for .NET

Aspose.OCR for .NET is a character recognition component built to allow developers to add OCR functionality in their ASP .NET web applications, web services and Windows applications. It provides a simple set of classes for controlling character recognition tasks. It helps developers to work with image (BMP, TIFF) files from within their own applications. It allows developers to extract text from images quickly & easily, saving time & effort involved in developing an OCR solution from scratch.

- Homepage of Aspose.OCR for .NET: http://www.aspose.com/.net/ocr-component.aspx

-Download Aspose.OCR for .NET: http://www.aspose.com/community/files/51/.net-components/aspose.ocr-for-.net/default.aspx



Post Code  |  Code Snippet Home

User Responses


No response found, be the first to review this code snippet.

Submit feedback about this code snippet

Please sign in to post feedback

Latest Posts