One of the key objectives in preparation for the AI-102 Designing and Implementing a Microsoft Azure AI Solution exam involves understanding and working with the Computer Vision API, a part of Azure’s Cognitive Services offering. Among its many features, the Computer Vision API has the ability to extract text from images or PDFs, which is a crucial capability in today’s data-rich world. In this article, we will delve into how this process works and provide some practical examples to demonstrate its efficacy.

Computer vision, as its name suggests, is the science of getting computers to understand and process visual data. It’s a subset of AI that extracts information from images or multi-dimensional data. The Microsoft Azure Computer Vision API makes these advanced capabilities available as a service, simplifying how developers can incorporate image and video processing into their applications.

Table of Contents

Overview of Text Extraction from Images or PDFs

Text extraction, also known as Optical Character Recognition (OCR), is one of the primary functions provided by the Computer Vision API. This allows for unstructured data in the form of graphical content to be translated into machine-readable text, unlocking a vast array of possibilities.

This is very handy for cases like scanning invoices, reading license plates or IDs, digitizing printed documents, and much more.

Using the Computer Vision Service for Text Extraction

Azure’s Computer Vision service provides two key methods for text extraction:

  • OCR – Best suited for images with small amounts of text.
  • Read API – Ideal for larger amounts of text spread across several lines, or even whole documents.

OCR Method

This is the appropriate method for simple, sparse text in images. However, it might encounter challenges for densely packed texts like in a document or a paragraph. It takes an image as input and returns a list of recognized words along with bounding box coordinates.

Here’s a quick example of how easy it is to leverage the OCR operation. This example utilizes the Python SDK for the Computer Vision API:

from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from msrest.authentication import CognitiveServicesCredentials

# Enter your subscription key and endpoint
SUBSCRIPTION_KEY = ""
ENDPOINT = ""

# create a client
computervision_client = ComputerVisionClient(ENDPOINT, CognitiveServicesCredentials(SUBSCRIPTION_KEY))

# Specify the image's URL
remote_image_url = ""

# Run OCR on the image
ocr_results = computervision_client.recognize_printed_text_in_stream(remote_image_url)

# Print the results
for region in ocr_results.regions:
for line in region.lines:
print(" ".join([word.text for word in line.words]))

In the above code, we first import the necessary libraries and then initialize our client with the subscription key and endpoint. We then provide the URL of the image and run the OCR operation on it. Lastly, we loop through the results and print them.

Read API Method

The Read API supports both synchronous and asynchronous operations and is designed to extract printed and handwritten text from images and documents, including multi-page ones. Here’s a Python-based example of how to use the Read API:

# Specify the remote image or documents URL
read_image_url = ""

# Submit the request to Read and get the operation location
read_headers = {'Ocp-Apim-Subscription-Key': SUBSCRIPTION_KEY}
read_response = requests.post(read_image_url, headers=read_headers)
read_operation_location = read_response.headers["Operation-Location"]

# Make the second REST API call and get the result
read_result_response = requests.get(read_operation_location, headers=read_headers)
read_result = read_result_response.json()

# Print the detected text
for line in read_result["analyzeResult"]["readResults"][0]["lines"]:
print(line["text"])

In the Read API example, we start by specifying the image URL and sending a POST request to the Read endpoint. We then take the operation location from the response header and make another GET request to receive our result. Finally, we loop through the results and print them.

To provide the best results for different use cases, you need to choose either OCR or the Read API according to the requirements. Using Microsoft Azure’s Computer Vision service, we can make our applications smarter and more engaging, and as we’ve seen, implementing it is fairly straightforward. The potential of this technology is immense, enabling all sorts of innovative solutions and advanced AI implementations.

Above are some of the easy-to-understand examples explaining the text extraction from images or PDFs using Azure Computer Vision Service. Mastering these skills is fundamental to becoming proficient in AI solutions implementation on the Microsoft Azure platform.

Practice Test

True or False: Computer Vision Service is incapable of extracting text from images and PDFs.

  • True
  • False

Answer: False

Explanation: Computer Vision service is specifically designed to extract text and recognize features from images and PDFs.

Can you utilize Computer Vision Service to recognize handwriting as well as printed text in documents?

  • True
  • False

Answer: True

Explanation: Computer Vision Service offers the Read API for recognizing both printed and handwritten text in images, including from PDFs.

What type of OCR does the Computer Vision service support?

  • A. Printed OCR
  • B. Handwritten OCR
  • C. Supported languages OCR
  • D. All of the above

Answer: D. All of the above

Explanation: Computer Vision supports Printed OCR, Handwritten OCR, as well as OCR for multiple supported languages.

Can you use the Computer Vision service to convert a PDF into an editable text document?

  • True
  • False

Answer: True

Explanation: The Computer Vision service can extract text from PDFs and images, allowing you to convert PDFs into editable text documents.

How many languages does the Computer Vision service OCR support?

  • A. 5
  • B. 10
  • C. 25
  • D. More than 25

Answer: D. More than 25

Explanation: The Computer Vision service OCR supports more than 25 languages, making it versatile and usable across different regions.

Can Computer Vision service interpret barcodes in PDF documents?

  • True
  • False

Answer: False

Explanation: Although computer vision is adept at processing and extracting text from images and PDFs, it doesn’t interpret barcodes.

Can the Computer Vision service distinguish between different types of printed documents?

  • True
  • False

Answer: True

Explanation: The computer vision service can recognize and differentiate various types of printed documents such as articles, invoices etc.

One of the primary uses of the Computer Vision service is to analyze and describe the content of images. T/F?

  • True
  • False

Answer: True

Explanation: The Computer Vision service not only extracts text but also analysis the content of the image to provide a description.

Which API does the Computer Vision service offer for recognizing both printed and handwritten text in images?

  • A. Compute API
  • B. Read API
  • C. Write API
  • D. None of the above

Answer: B. Read API

Explanation: The Read API in the Computer Vision service is specifically designed for recognizing both printed and handwritten text.

Can an image be uploaded directly to Computer Vision service for text extraction?

  • True
  • False

Answer: True

Explanation: The Computer Vision allows uploading of images directly for text extraction and analysis.

Can Computer Vision service be utilized for recognizing celebrities and landmarks in a given image?

  • True
  • False

Answer: True

Explanation: Computer Vision service has this unique feature that can recognize celebrities and landmarks in a given image through it’s API.

Computer Vision service comes with features that cannot detect and extract printed text in multiple languages. T/F?

  • True
  • False

Answer: False

Explanation: Computer Vision service supports the extraction of printed text in multiple languages, making it a versatile tool.

Are there any limitations with the Read API in terms of the size of the PDF file it can process?

  • True
  • False

Answer: True

Explanation: The Read API has limitations in terms of the number of pages (2000 pages) it can process, and the file size (50MB for direct raw data).

The Computer Vision API returns the extracted text, bounding boxes, and confidence score. T/F?

  • True
  • False

Answer: True

Explanation: The computer vision API returns the extracted text, the boundaries of the recognized elements, and a confidence score for the recognition.

An image with a single line of text can be processed directly using the ‘Recognize Text’ API in the Computer Vision service. T/F?

  • True
  • False

Answer: False

Explanation: For single line text, the ‘Recognize Content’ API should be used instead of ‘Recognize Text.

Interview Questions

What is the primary functionality of Azure’s Computer Vision service?

Azure’s Computer Vision service uses artificial intelligence to analyze and extract information from images, handwritten or printed text, and objects in images or video. It can recognize celebrities, landmarks, brand logos, and extract printed and handwritten text from images and PDFs.

Can Computer Vision service extract text from images in languages other than English?

Yes, the Computer Vision service supports various different languages for text extraction, not just English.

What is OCR in the context of Azure AI?

OCR or Optical Character Recognition is a technology used in Azure AI to convert different types of documents, such as scanned paper documents, PDF files or images captured by a digital camera, into editable and searchable data.

What is the Read API in Computer Vision service?

The Read API is used for text recognition operations (OCR) that detects and extracts printed or handwritten text from images or PDFs with the ability to recognize the text in different languages.

What is the main difference between the ‘Recognize Text’ and ‘Read’ operations in Computer Vision Service?

Recognize Text is optimized for detecting printed text in images, for example, in a landscape or on a road sign, while the Read operation is optimized for a larger body of text, such as a paragraph or article.

Does the Computer Vision service provide a way to structure the data extracted from an image or PDF?

Yes, the Read operation not only extracts the text from the image but also provides information about the structure of the content. For example, it identifies the paragraphs, lines, and words.

How to improve the accuracy of text extraction from images using Computer Vision service?

Accuracy can be improved by providing clear images with high contrast, ample lighting, and clearly visible text. The text should be right-side-up and in focus.

Is it possible to preview bounding box coordinates for extracted text using Computer Vision service API?

Yes, each word or line recognized through the Computer Vision service API can return its bounding box coordinates that represent the position of the word or line on the image.

Can the Computer Vision service extract text from handwritten documents?

Yes, the Computer Vision service can extract text from both printed and handwritten documents. However, the accuracy may depend on the legibility of the handwriting.

What are the typical use cases for Azure Computer Vision service?

Typical use cases include receipt or invoice processing, identity document recognition, form processing, content moderation, brand detection in images, and building accessibility applications by integrating OCR capabilities.

Can the Computer Vision service detect symbols or only text characters?

The Read API in Computer Vision service can extract a wide range of printed text, including special characters and symbols, from images and multi-page PDF documents.

In what scenarios would you use the Batch Read File method in Azure Computer Vision service?

The Batch Read File method is used when you need to extract text from a multi-page PDF document or a TIFF image because these formats are not supported by standard Read API operations.

Can the Computer Vision service be used for real-time text extraction from images or PDFs?

While the Read API provides faster operation than previous versions, it is not designed for real-time text extraction. The time to get the results depends on the size and complexity of the document.

What limits apply to text extraction in the Computer Vision service?

As of October 2021, The Read API supports images up to 17 megapixels and documents up to 200 pages long, but individual pages must not exceed 17 megapixels.

Is it possible to train the Computer Vision service for better text extraction results?

No, the Computer Vision service is a pre-trained model. For customized training, you could consider Azure’s Form Recognizer service, which allows custom training on your specific documents.

Leave a Reply

Your email address will not be published. Required fields are marked *