Home > Code > C# > How to Search Text in a PDF using Regular Expression & Add Hyperlink over It

How to Search Text in a PDF using Regular Expression & Add Hyperlink over It

by sher azam   on Feb 25, 2015   Category: C#   |  Views: 942    |  Points: 25   |  Gold 


This technical tip shows how .NET developers can search text inside PDF file using a regular expression and adding hyperlinks over the matches inside their .NET applications. To find a phrase and add hyperlink over it, first pass the regular expression as a parameter to the TextFragmentAbsorber constructor and then create a TextSearchOptions object which specifies whether the regular expression is used or not. After that get the matching phrases into TextFragments and loop though the matches to get their rectangular dimensions, change foreground color to blue (optional - to make it appear like hyperlink and create a link using the PdfContentEditor class' CreateWebLink(..) method. Lastly save the updated PDF using Save method of Document object.

The following code snippet shows you how to search text inside PDF file using a regular expression and adding hyperlinks over the matches.

//C# Code Sample

//create absorber object to find all instances of the input search phrase
TextFragmentAbsorber absorber = new TextFragmentAbsorber("D[a-z]{7}:");
//Enable regular expression search
absorber.TextSearchOptions = new TextSearchOptions(true);
//open document
PdfContentEditor editor = new PdfContentEditor();
// bind source PDF file
editor.BindPdf("c:/pdftest/25741-out.pdf");
//accept the absorber for the page
editor.Document.Pages[1].Accept(absorber);

int[] dashArray = { };
String[] LEArray = { };
System.Drawing.Color blue = System.Drawing.Color.Blue;

//loop through the fragments
foreach (TextFragment textFragment in absorber.TextFragments)
{
textFragment.TextState.ForegroundColor = Aspose.Pdf.Color.Blue;
System.Drawing.Rectangle rect = new System.Drawing.Rectangle((int)textFragment.Rectangle.LLX,
(int)Math.Round(textFragment.Rectangle.LLY), (int)Math.Round(textFragment.Rectangle.Width + 2),
(int)Math.Round(textFragment.Rectangle.Height + 1));
Enum[] actionName = new Enum[2] { Aspose.Pdf.InteractiveFeatures.PredefinedAction.Document_AttachFile, Aspose.Pdf.InteractiveFeatures.PredefinedAction.Document_ExtractPages };
editor.CreateWebLink(rect, "http://www.aspose.com", 1, blue, actionName);
editor.CreateLine(rect, "", (float)textFragment.Rectangle.LLX + 1, (float)textFragment.Rectangle.LLY - 1,
(float)textFragment.Rectangle.URX, (float)textFragment.Rectangle.LLY - 1, 1, 1, blue, "S", dashArray, LEArray);
}
//Save & Close the document
editor.Save("c:/pdftest/TextReplaced_with_Links.pdf");
editor.Close();

//VB.NET Code Sample

'create absorber object to find all instances of the input search phrase
Dim absorber As Aspose.Pdf.Text.TextFragmentAbsorber = New Aspose.Pdf.Text.TextFragmentAbsorber("D[a-z]{7}")
'Enable regular expression search
absorber.TextSearchOptions = New TextSearchOptions(True)
'open document
Dim editor As Aspose.Pdf.Facades.PdfContentEditor = New Aspose.Pdf.Facades.PdfContentEditor()
' bind source PDF file
editor.BindPdf("c:/pdftest/25741-out.pdf")
'accept the absorber for the page
editor.Document.Pages(1).Accept(absorber)

Dim dashArray() As Integer = {}
Dim LEArray() As String = {}
Dim blue As System.Drawing.Color = System.Drawing.Color.Blue

'loop through the fragments
For Each textFragment As Aspose.Pdf.Text.TextFragment In absorber.TextFragments

textFragment.TextState.ForegroundColor = Aspose.Pdf.Color.Blue
Dim rect As System.Drawing.Rectangle = New System.Drawing.Rectangle(CType(textFragment.Rectangle.LLX, Integer),
CType(Math.Round(textFragment.Rectangle.LLY), Integer), CType(Math.Round(textFragment.Rectangle.Width + 2), Integer),
CType(Math.Round(textFragment.Rectangle.Height + 1), Integer))
Dim actionName() As System.Enum = New System.Enum() {Aspose.Pdf.InteractiveFeatures.PredefinedAction.Document_AttachFile, Aspose.Pdf.InteractiveFeatures.PredefinedAction.Document_ExtractPages}
editor.CreateWebLink(rect, "http://www.aspose.com", 1, blue, actionName)
editor.CreateLine(rect, "", CType(textFragment.Rectangle.LLX + 1, Double), CType(textFragment.Rectangle.LLY - 1, Double),
CType(textFragment.Rectangle.URX, Double), CType(textFragment.Rectangle.LLY - 1, Double), 1, 1, blue, "S", dashArray, LEArray)
Next

'Save & Close the document
editor.Save("c:/pdftest/TextReplaced_with_Links.pdf")
editor.Close()



More about Aspose.Pdf for .NET

- Homepage of Aspose.Pdf for .NET: http://www.aspose.com/.net/pdf-component.aspx

- Download Aspose.Pdf for .NET at: http://www.aspose.com/community/files/51/.net-components/aspose.pdf-for-.net/default.aspx



Post Code  |  Code Snippet Home

User Responses


No response found, be the first to review this code snippet.

Submit feedback about this code snippet

Please sign in to post feedback

Latest Posts