The Computer Vision API is part of Azure Cognitive Services, a cloud-based service that provides developers with access to intelligent algorithms that can see, hear, speak, understand and even make decisions.
- OCR (Optical Character Recognition): This feature helps to detect text in an image and return the recognized characters. But it has limitations in terms of detecting handwritten text.
- Read: The ‘read’ feature can be beneficial for handwriting recognition as it identifies and reads printed or handwritten text from images or larger documents.
Steps To Use Azure Computer Vision API
Step-1: Create an Azure Account
Before using Azure’s Computer Vision API, an account on the Azure portal is a prerequisite. Upon user registration, Azure offers a credit limit for the new users, which can be utilized to explore the paid features of Azure.
Step-2: Setup Computer Vision API Service
In the Azure portal, Create a new resource, and select ‘AI + Machine Learning’ options. Then, select ‘Computer Vision,’ fill in the necessary details, and create the resource.
Step-3: Retrieve Required Keys to Access the API
After setting up the Computer Vision API, Retrieve the ‘Key’ and ‘Endpoint’ from the resource management page. These keys will be used to access the API.
Step-4: Send the Image to the API for Analysis
Azure provides SDKs in multiple languages to interact with the Computer Vision API. Here is a Python code sample showcasing how to send a request to the API:
from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from msrest.authentication import CognitiveServicesCredentials
# Set up API client
client = ComputerVisionClient("
# Read image into stream
with open('path_to_image', 'rb') as image_stream:
image_data = image_stream.read()
# Send data to API
read_results = client.read_in_stream(image_data, raw=True)
# Get operation location (URL with ID as last appendage)
operation_location = read_results.headers["Operation-Location"]
# Retrieve the recognized text
result = client.get_read_result(operation_location)
# Output the result
for line in result.analyze_result.read_results[0].lines:
print(line.text)
Step-5: Model Response Interpretation
The results are returned in a JSON format that needs to be interpreted. It includes information like bounding boxes, confidence scores, lines, and words.
Example of JSON Response:
{
"language": "en",
"textAngle": 0.0,
"orientation": "Up",
"regions": [
{
"boundingBox": "365,211,880,179",
"lines": [
{
"boundingBox": "102,275,492,60",
"words": [
{
"boundingBox": "102,275,215,60",
"text": "This"
},
{
"boundingBox": "317,277,277,58",
"text": "test."
}
]
}
]
}
]
}
The ‘text’ fields contain the recognized text.
These methods make converting handwritten text using Computer Vision API in Azure a simplified process. By leveraging Azure’s powerful AI capabilities, developers can implement highly efficient, fast, and reliable document digitization features in their applications.
Practice Test
True or False: Computer Vision service can analyze handwritten text on images.
- True
- False
Answer: True
Explanation: One of the capabilities of the Azure Computer Vision service is Optical Character Recognition (OCR) which can extract printed and handwritten text from images.
What type of services does Computer Vision API offer? (Multiple select)
- A) Facial Recognition
- B) Image Classification
- C) Handwriting Analysis
- D) Speech Recognition
Answer: A, B, C
Explanation: The Computer Vision API provides a suite of services including facial recognition, image classification and handwriting analysis. Speech recognition is out of its scope and is handled by the Speech Service in Azure.
The ____ can be used to analyze a local image for printed and handwritten text using Azure.
- A) Text Analytics API
- B) Speech service
- C) Read API
- D) QnA Maker Service
Answer: C, Read API
Explanation: Read API, a part of the Computer Vision service, can analyze a local image for printed and handwritten text on Azure.
True or False: Computer Vision API cannot extract text from a noisy background.
- True
- False
Answer: False
Explanation: Azure’s Computer Vision API has robust capabilities and can often extract text even from images with noisy backgrounds.
Which of the following languages does the Computer Vision API support for handwriting recognition?
- A) English only
- B) English and Spanish only
- C) All common European languages
- D) All languages
Answer: A, English only
Explanation: Currently, Azure’s Computer Vision OCR service only recognizes English text for handwriting.
True or False: Azure’s Computer Vision API can process only one image at a time.
- True
- False
Answer: False
Explanation: The Computer Vision API can process up to 20 images in a batch operation.
The Read API uses ____ models to extract printed and handwritten text from images.
- A) Rule-based
- B) Machine Learning
- C) Fixed-parameter
- D) Hard-coded
Answer: B, Machine Learning
Explanation: Azure’s Read API uses Machine Learning models to analyze and extract printed and handwritten text from images.
Which of the following format of image does Azure’s Computer Vision API support to extract the handwritten text?
- A) JPEG
- B) PNG
- C) TIFF
- D) All of the above
Answer: D, All of the above
Explanation: Azure’s Computer Vision API supports various image formats including JPEG, PNG, and TIFF to extract the handwritten text.
True or False: The quality of image does not impact the accuracy of text extraction in Azure’s Computer Vision API.
- True
- False
Answer: False
Explanation: Better the quality of the image, better will be the accuracy of text extraction.
Directly reading text from a noisy background is a part of which feature provided by Azure Computer Vision?
- A) Optical Character Recognition (OCR)
- B) Text Analytics
- C) Read API Handwriting Recognition
- D) None of the above
Answer: A, Optical Character Recognition (OCR)
Explanation: OCR is the feature of Azure Computer Vision that deals with reading text directly from images, even sometimes with noisy backgrounds.
Interview Questions
What is the Computer Vision service within Microsoft Azure?
Microsoft Azure’s Computer Vision service is a part of Azure’s Cognitive Services that empowers developers to analyze and extract information from images in a diverse number of ways. It uses advanced algorithms to identify and extract text from written or printed documents and images.
How does the Computer Vision service convert handwritten text into machine-readable text?
This is accomplished by a process known as Optical Character Recognition (OCR). It involves image preprocessing, character recognition, and post-processing to transform the handwritten text into machine-readable format.
What are some primary use cases for the Computer Vision service’s text recognition capabilities?
This feature can be useful in various applications like digitizing handwritten historical documents, processing forms or invoices, or enabling the searchability of text in image-based data.
Can the Computer Vision service recognize text in multiple languages?
Yes, the Computer Vision’s Read API supports multiple languages for text extraction, including English, Spanish, French, German, Italian, Dutch, Portuguese, and more.
What is the difference between the OCR API and the Read API within the Computer Vision service?
The OCR API is designed for images with small amounts of text, and it runs synchronously, providing results immediately. In contrast, the Read API is intended for larger documents and it runs asynchronously, able to analyze an entire image in detail and return the text line by line.
What types of inputs does the Computer Vision service accept for text recognition?
The service can analyze images in various formats including JPEG, PNG, GIF, BMP, and TIFF. The images can be provided as either a raw byte array or as a URL to an image.
What are some limitations of the Computer Vision service in recognizing handwritten text?
While the service is robust, it may struggle with text that is extremely distorted, very small, or in an unusual font. Additionally, it may have difficulty with documents with complex layouts or low image quality.
Can the Computer Vision service detect handwriting orientation?
Yes, the Computer Vision service can detect and correct the orientation of handwritten text in an image up to 360 degrees.
Is the Computer Vision OCR system capable of recognizing cursive handwriting?
The system performs best with print handwriting. While it can recognize some cursive text, the results are less reliable, and individual results may vary greatly depending on the legibility of the handwriting.
Can the Computer Vision service handle real-time processing of handwritten texts?
Yes, with the use of the Read API, the Computer Vision service is capable of analyzing complex documents in real time.
Is the Azure Computer Vision service capable of extracting handwritten text from a document with mixed typed and handwritten text?
Yes, Azure’s Computer Vision service can extract text from documents with both typed and handwritten text using the Read API’s advanced recognition capabilities.
Are there any specific recommendations for improving the performance of Azure’s Computer Vision service for handwritten text recognition?
Yes, using high-quality images and ensuring the handwritten text is clear and legible can dramatically improve the accuracy of the text recognition. Also, the size of the text in the image should be ideally around 20 pixels tall.
How does the Computer Vision service handle text recognition in noisy or dark images?
The Computer Vision service uses a range of preprocessing techniques to enhance the quality of images before text recognition. That said, excessively noisy or dark images may still impact the accuracy of the OCR.
Do the OCR and Read APIs return the same types of results?
The OCR and Read APIs return similar results, but the Read API additionally returns the text recognized in a more structured hierarchy of lines and words. This makes it easier to work with reading results from larger documents.
How can the Read API be used to track the progress of a long-running OCR operation?
The Read API is asynchronous, and when a request is made, it returns an operation location (URL) in the response headers. This URL can be called to get the status and final result of the OCR operation.