There are multiple ways to read the PDF, In this article, we will discuss Azure AI Form Recognizer. Let's learn how to read PDF in .NET/C# code with Form Recognizer? First of all, let's discuss Form Recognizer.
Form Recognizer is a cloud-based service by Microsoft Azure that allows developers to extract information from Forms and Documents. It includes formats like PDF, Images, and Scanned Documents. Form Recognizer can be accessed through Rest API or SDKs.Net, Python, and Javascript.
Let's first create a form Recognizer in Azure Portal.
Step 1: Go to Azure Portal and create a form recognizer
Step 2: Go to Keys and Endpoint and copy Key and Endpoint and we will use both in the code.
Step 3: Create your application either console framework or core. You need to add Azure.AI.FormRecognizer Nuget Package in your project. Copy the below code and use it.
In the below sample, I am passing the static Azure Blob PDF path, if you need dynamic PDF, follow the previous article to upload the file in a storage blob and use that URL.
public async Task<string> Index(IFormFile file)
{
var pdfURL = "https://myblog.blob.core.windows.net/sample/Constellation.pdf";
var uri = new Uri(pdfURL);
var key = "yourkeyhere";
var endPoint = "https://yoururl.cognitiveservices.azure.com/";
AzureKeyCredential credential = new(key);
var client = new DocumentAnalysisClient(new Uri(endPoint), credential);
var operation = await client.AnalyzeDocumentFromUriAsync(WaitUntil.Completed,
"prebuilt-document", uri);
var result = operation.Value;
// Read key Value Pair
foreach (DocumentKeyValuePair kvp in result.KeyValuePairs)
{
if (kvp.Value == null)
{
Console.WriteLine($" Found key with no value: '{kvp.Key.Content}'");
}
else
{
Console.WriteLine($"Found key-value pair:'{kvp.Key.Content}' and " +
$"'{kvp.Value.Content}'");
}
}
// Read the tables here
if (result.Tables.Count > 0)
{
for (int i = 0; i < result.Tables.Count; i++)
{
DocumentTable table = result.Tables[i];
Console.WriteLine($" Table {i} has {table.RowCount} rows " +
$"and {table.ColumnCount} columns.");
foreach (DocumentTableCell cell in table.Cells)
{
Console.WriteLine($"Cell ({cell.RowIndex}, {cell.ColumnIndex}) " +
$"has kind '{cell.Kind}' and content: '{cell.Content}'.");
}
}
}
return string.Empty;
}
You can see the output in a console window and it will read any PDF you like, This Form Recognzier will cost you around $1.5 for 1000 pages. For detailed pricing, you can follow the Microsoft link below
Form Recognizer Pricing
If you want to read the PDF and image with Azure Cognitive Service, you can follow the below link
No comments:
Post a Comment