Creating and running an indexer in Microsoft Azure is a critical task when implementing a Microsoft Azure AI solution, especially when approaching the AI-102 exam. An indexer, essentially, is a crawler that extracts searchable data and metadata from the content of the data source.
Understanding Azure Indexer
The Azure Search Indexer is a feature of Azure Cognitive Search service responsible for linking data sources with an index. The indexer connects to the data source, reads the data, and then writes the changes to search indexes. It helps keep the search index updated with data changes in the data source.
The Azure Search Indexer supports different types of data sources, such as Azure Cosmos DB, Azure Blob Storage, and Azure SQL Database.
Creating an Indexer
To create an indexer, it’s important you already have a search service, a data source, and an index set up in Azure. Then, follow these steps:
- Go to the Azure portal.
- Open your Azure Cognitive Search service.
- In the search service page, go to “Import data”.
- In the Import data page, select the data source and the target index for your indexer.
- Setup the indexer configuration. This includes specifying the schedule (if automatic running is needed) and the parameters for the indexer.
- Click on “Submit” to create the indexer.
Here is an example of how to create an indexer in .NET SDK:
SearchIndexer indexer = new SearchIndexer(
"myindexer",
"mydatasource",
"myindex")
{
Schedule = new SearchIndexerSchedule(TimeSpan.FromDays(1)) // Re-index every 24 hours
};
indexerClient.CreateOrUpdateIndexer(indexer);
Running an Indexer
Once set up, the indexer will run automatically based on the schedule. You can also manually trigger the indexer to run immediately. To do this:
- Go to the Azure portal.
- Go to the indexer details page under the search service.
- Click “Run” to execute the indexer.
Alternatively, you can also use REST APIs to manually start the indexer using the HTTP POST method like so:
POST /indexers/myindexer/run?api-version=2020-06-30
Indexer Status
After running the indexer, you can check its status to see if it is successful or error-prone. The status includes information like the last result, successful document count, failed document count, and more.
- Go to the Azure portal.
- Go to the indexer details page in your search service.
- Check the status panel for information on the indexer run.
In conclusion, the Azure Search Indexer offers an easy-to-use service to keep Azure Cognitive Search indexes updated with the latest data changes. It’s a powerful tool when designing and implementing a Microsoft Azure AI solution. However, its successful implementation requires a thorough understanding and effective practice, so spending time navigating and becoming comfortable with the Azure portal and its functionalities is crucial.
Practice Test
True or False: An indexer in Azure Cognitive Search is responsible for moving data from the data source to the search index.
- True
- False
Answer: True.
Explanation: An indexer in Azure Cognitive Search is a crawler that extracts searchable content and metadata from various sources and populates the index with this content.
Which of the following are required to create and run an indexer in Azure Cognitive Search?
- A. Data Source
- B. Skillset
- C. Index
- D. Search Service
Answer: A, C, D.
Explanation: To create and run an indexer, a data source, an index (searchable content) and a search service (the search application) are required. Skillsets are associated with AI enrichment and not necessarily required to run an indexer.
True or False: An indexer in Azure Cognitive Search can only run once.
- True
- False
Answer: False.
Explanation: An indexer can be scheduled to run at specific intervals or can be manually run as many times as necessary.
Multiple select: Which are the valid status values for a running indexer in Azure Cognitive Search?
- A. Error
- B. Running
- C. In Progress
- D. Success
Answer: A, B, D.
Explanation: The valid status values for a running indexer include: InProgress, Success, TransientFailure, and Reset.
True or False: Azure Cognitive Search indexer cannot extract data from Azure Blob Storage.
- True
- False
Answer: False.
Explanation: Azure Cognitive Search indexer can extract data from various sources including Azure Blob Storage.
Which of the following retrieves status and execution history for an indexer in Azure?
- A. Get Indexer Status API
- B. Retrieve Indexer Execution History API
- C. Retrieve Indexer API
- D. Get Indexer Execution History API
Answer: A, D.
Explanation: Get Indexer Status API and Get Indexer Execution History API are used to get the status and execution history of an indexer respectively.
True or False: An indexer can only crawl structured data for search.
- True
- False
Answer: False.
Explanation: An indexer in Azure Cognitive Search can crawl both structured data such as tables and unstructured data such as text.
Multiple select: Which of the following are methods to run an indexer in Azure Cognitive Search?
- A. Run Indexer API
- B. Schedule
- C. Manual invocation
- D. Automatic invocation based on data change detection
Answer: A, B, C, D.
Explanation: You can run an indexer by using the Run Indexer API, scheduling recurring runs, manually invoking run or automatically triggering it based on data change detection.
True or False: You can configure an indexer to use a field mapping to map a field from the data source to an index field with a different name.
- True
- False
Answer: True.
Explanation: Field mappings in an Azure Cognitive Search index definition enable you to map a source field to a destination field of a different name.
Single select: Which is not a valid data source object for Azure Cognitive Search Indexer?
- A. Azure SQL Database
- B. Cosmos DB
- C. Azure Cognitive Services
- D. Azure Blob Storage
Answer: C. Azure Cognitive Services.
Explanation: Azure Cognitive Services isn’t a data source object. Azure Cognitive Search Indexer uses Azure SQL Database, Cosmos DB and Azure Blob Storage as data sources.
Interview Questions
What is the primary function of an Azure Search indexer?
The primary function of an Azure Search indexer is to automate data ingestion from various data sources into an Azure Search index.
Can you manually control when an indexer runs in Azure Search?
Yes, you can manually control when an indexer runs in Azure Search by using the ‘Run Indexer’ API.
What is the main difference between an indexer and a data source in Azure Search?
A data source represents the content to be indexed while an indexer is a crawler that extracts searchable data and metadata from the data source.
How can you handle errors when running an Azure Search indexer?
Indexer errors can be handled through Azure portal, REST APIs or .NET SDKs. Detailed error messages will be given in the response of the respective interfaces which can be used to diagnose and troubleshoot the problem.
What types of data sources can Azure Search indexers operate on?
Azure Search indexers can operate on several types of data sources including Azure SQL Database, Azure Cosmos DB, Azure Blob Storage, and Azure Table Storage.
How can you configure an indexer to delete documents in an index?
You can configure soft or hard deletes in an indexer with the use of ‘Data Deletion Detection Policy’ in Azure search.
In Azure Search, is it possible to run multiple indexers on the same data source?
Yes, it is possible to run multiple indexers on the same data source in Azure Search. This may be necessary if you want to handle different types of data differently, or to target different indexes.
Can you modify an indexer definition in Azure Search once it’s been created?
Yes, you can modify the definition of an Azure Search indexer after it’s been created, although the changes will only affect future indexing operations.
What are the scheduling options for running an indexer in Azure Search?
In Azure Search, an indexer can be scheduled to run on a regular interval- hourly, daily, weekly, or not at all.
What’s the role of field mappings in Azure Search Indexer?
Field mappings in an Azure Search indexer define how fields from a data source map to fields in a search index.
How does the Azure Search Indexer handle changes in the data source?
Azure Search indexer incorporates changes in the data source into the search index by using change tracking.
Is it possible to pause or stop an Azure Search indexer while it’s running?
No, once an Azure Search indexer starts running, there’s no built-in feature to pause or stop it.
What is the ‘high watermark’ in relation to Azure Search indexer?
In Azure Search, a ‘high watermark’ is a value that the indexer uses to track the last item that was indexed from the data source.
Can Azure Search indexers handle data source updates that occur while the indexer is running?
Yes, Azure Search indexers can handle data source updates that occur while the indexer is running using an optimistic concurrency model.
Can you recover a deleted Azure Search Indexer?
No, once deleted, an Azure Search Indexer cannot be recovered. As such, it is advised to take caution when deleting indexers.