Friday, 24 March 2023

Read PDF with Azure AI - Form Recognizer in .NET C#

There are multiple ways to read the PDF, In this article, we will discuss Azure AI Form Recognizer. Let's learn how to read PDF in .NET/C# code with Form Recognizer? First of all, let's discuss Form Recognizer.

Form Recognizer is a cloud-based service by Microsoft Azure that allows developers to extract information from Forms and Documents. It includes formats like PDF, Images, and Scanned Documents. Form Recognizer can be accessed through Rest API or SDKs.Net, Python, and Javascript.

Let's first create a form Recognizer in Azure Portal.

Step 1: Go to Azure Portal and create a form recognizer










Step 2: Go to Keys and Endpoint and copy Key and Endpoint and we will use both in the code.


















Step 3: Create your application either console framework or core. You need to add Azure.AI.FormRecognizer Nuget Package in your project. Copy the below code and use it.


In the below sample, I am passing the static Azure Blob PDF path, if you need dynamic PDF, follow the previous article to upload the file in a storage blob and use that URL.


     public async Task<string> Index(IFormFile file)
        {
            var pdfURL = "https://myblog.blob.core.windows.net/sample/Constellation.pdf";
            var uri = new Uri(pdfURL);

            var key = "yourkeyhere";
            var endPoint = "https://yoururl.cognitiveservices.azure.com/";

            AzureKeyCredential credential = new(key);
            var client = new DocumentAnalysisClient(new Uri(endPoint), credential);
            var operation = await client.AnalyzeDocumentFromUriAsync(WaitUntil.Completed, 
            "prebuilt-document", uri);
            var result = operation.Value;

            // Read key Value Pair
            foreach (DocumentKeyValuePair kvp in result.KeyValuePairs)
            {
                if (kvp.Value == null)
                {
                    Console.WriteLine($"  Found key with no value: '{kvp.Key.Content}'");
                }
                else
                {
                    Console.WriteLine($"Found key-value pair:'{kvp.Key.Content}' and " +
                    $"'{kvp.Value.Content}'");
                }
            }
             // Read the tables here
            if (result.Tables.Count > 0)
            {
                for (int i = 0; i < result.Tables.Count; i++)
                {
                    DocumentTable table = result.Tables[i];
                    Console.WriteLine($"  Table {i} has {table.RowCount} rows " +
                        $"and {table.ColumnCount} columns.");

                    foreach (DocumentTableCell cell in table.Cells)
                    {
                        Console.WriteLine($"Cell ({cell.RowIndex}, {cell.ColumnIndex}) " +
                        $"has kind '{cell.Kind}' and content: '{cell.Content}'.");
                    }
                }
            }

            return string.Empty;
        }  

You can see the output in a console window and it will read any PDF you like, This Form Recognzier will cost you around $1.5 for 1000 pages. For detailed pricing, you can follow the Microsoft link below

Form Recognizer Pricing


If you want to read the PDF and image with Azure Cognitive Service, you can follow the below link


Read PDF with Azure Cognitive Service




No comments:

Post a Comment

Implement Authorization in Swagger with Static Value in Header .Net 8

If you want an anonymous user should not run the APIs. To run your API Endpoints From Swagger / Postman / Code the user should pass the head...