Azure Purview is Microsoft’s unified data governance platform that helps organizations achieve a comprehensive understanding of their data landscape. One of the key aspects of this platform is it aids in identifying data sources. This wouldn’t only ensure the right data usage but also improve the overall efficiency of analytics solutions. Let’s delve deeper into understanding how to identify different data sources using Azure Purview.
Getting started with Azure Purview
Before jumping into data identification, it’s essential to understand the basics of how Purview works. Apart from providing data insights, Purview helps you to create a data catalog, rationalize your digital estate, and maintain compliance with privacy norms. When we talk about identifying data sources, Azure Purview can be used to scan a myriad of sources, both on Azure and other platforms.
Scanning in Azure Purview
Azure Purview allows you to scan your data estate using various scan rulesets. A scan rule set is a collection of rules that determine how the data source should be scanned. A single source can have multiple scans, and each can be based on different rule sets. Each rule set has settings for:
- Data sensitivity
- Classification rules
- Data schema extraction
You can configure these settings to suit the needs of a specific scan. Scans can be run once or scheduled on a recurring basis.
Here is an example using Azure Blob Storage.
First, you need to create a data source of Azure Blob storage:
az purview account data-sources create --account-name purviewaccount --collection-name DefaultCollection --data-source-name TestAzureBlobDataSource --kind AzureBlobStorage --resource-group purviewRG
Then, you can manage a scanning of the Azure Blob data source:
az purview scan create --account-name purviewaccount --collection-name DefaultCollection --scan-name TestScanAzureBlob --data-source-name TestAzureBlobDataSource --resource-group purviewRG
Once a scan is complete, you can view the scan results in Azure Purview Studio.
Identifying Data Sources
Azure Purview supports a variety of data sources, which can be broadly categorized into four categories:
- Azure Data Sources: This includes services such as Azure Data Lake Storage, Azure SQL Database, Azure Cosmos DB, etc.
- On-premises Data Sources: This includes SQL Server, Teradata, SAP etc.
- Multi-Cloud Data Sources: Amazon S3, Google BigQuery etc.
- SaaS (Software-as-a-Service) Data Sources: Microsoft Power BI, ServiceNow etc.
Each of these data sources have different configuration settings for Azure Purview to access and scan. For example, for Azure Blob Storage:
Property | Value/Description |
---|---|
Kind | AzureBlobStorage |
Endpoint | Storage account URL |
Root | The starting point for scanning |
Key | Azure storage key |
On the other hand, for a SQL Server Database:
Property | Value/Description |
---|---|
Kind | SqlServerDatabase |
Endpoint | SQL Server Database URL |
Database | Database name |
User | SQL Server username |
Password | SQL Server password |
Remember that Azure Purview doesn’t store your data, it only catalogues metadata about the data sources.
Conclusion
In summary, Azure Purview assists enterprises in understanding their data landscape by identifying and cataloguing various data sources, both in Azure and other platforms. This functionality is crucial for any organization looking to improve their data management practices and create an effective enterprise-scale analytics solution. This also plays a significant role in preparing for the DP-500 exam, which tests your understanding of designing and implementing such solutions using Microsoft Azure and Microsoft Power BI.
Practice Test
True or False: Microsoft Purview is a unified data governance service that helps users manage and govern their on-premises, multi-cloud, and software-as-a-service (SaaS) data.
- True
- False
Answer: True.
Explanation: Microsoft Purview is a data governance service that provides a holistic, unified view of your data across on-premises, multi-cloud, and SaaS systems.
Which of the following can you use Microsoft Purview to do?
- A) Identifying sensitive data
- B) Classifying data across many repositories
- C) Mapping data lineage
- D) All of the above
Answer: D) All of the above.
Explanation: Microsoft Purview allows users to identify sensitive data, classify data across various repositories, and map data lineage for governance and compliance purposes.
True or False: Microsoft Purview only supports Azure data sources.
- True
- False
Answer: False.
Explanation: Microsoft Purview supports a wide range of data sources both on-premises and in the cloud, including but not limited to Azure data sources.
In Microsoft Purview, what is Source-Data Lineage?
- A) The process of tracking data from its source to its destination
- B) A method of data encryption
- C) A data storage system
- D) A method to track user activity
Answer: A) The process of tracking data from its source to its destination
Explanation: Source-data lineage in a data governance context refers to tracking data from its origin through its lifecycle.
Can Microsoft Purview classify and label sensitive data across a wide range of data sources?
- True
- False
Answer: True.
Explanation: Microsoft Purview can scan, classify, and label sensitive data across a wide range of data sources, enabling better data governance and compliance with regulations such as GDPR.
True or False: Microsoft Purview supports Amazon S3 as a data source?
- True
- False
Answer: True.
Explanation: Microsoft Purview supports a large variety of data sources including Azure sources, Power BI, Teradata, SAP, SQL Server, Oracle, and Amazon S3
Which of the following is NOT a purpose of using Microsoft Purview?
- A) Data governance and cataloging
- B) Establishing data lineage
- C) Determining data ownership and responsibility
- D) Performing complex data computations
Answer: D) Performing complex data computations
Explanation: While Microsoft Purview is designed for data governance, cataloging, establishing data lineage, and determining data ownership, it’s not designed to perform complex computations on data.
True or False: Microsoft Purview can be helpful in achieving compliance with regulations such as GDPR.
- True
- False
Answer: True
Explanation: By classifying, cataloging, and protecting sensitive data, Microsoft Purview can assist businesses in achieving compliance with regulations like GDPR.
The Azure Purview Studio is the web interface for interacting with Azure Purview, where you can do everything from creating data sources to seeing scanning and classification results. Is this statement true or false?
- True
- False
Answer: True
Explanation: The Azure Purview Studio is indeed the central hub for interacting with Azure Purview, and it’s where you manage data sources and view scanning and classification results.
True or False: When you delete a data source in Microsoft Purview, it also deletes the data in the actual data source.
- True
- False
Answer: False
Explanation: Deleting a data source in Microsoft Purview only deletes the metadata representation of the data and not the actual data in the source.
Interview Questions
What is Microsoft Purview?
Microsoft Purview is a unified data governance service that helps organizations achieve a complete understanding of their data. It allows businesses to visualize, manage, and catalog data irrespective of where it resides.
How can Azure Purview help with identifying data sources?
Azure Purview allows you to scan and classify data across a wide array of sources, including multi-cloud and on-premises. This aids in discovering and understanding data, and can fix issues such as data silos.
What type of data sources can be scanned by Azure Purview?
Azure Purview supports a wide variety of data sources for scanning such as Azure Data Lake Storage, Azure Blob Storage, Azure SQL Database, Azure Cosmos DB, Azure Synapse Analytics, Power BI, SQL Server, Amazon S3, and more.
What are Azure Purview’s data scanner pools and scanner queues?
These are components that work together to scan data from different data sources. Azure Purview automatically provisions a data scanner queue for each supported data source type, while the scanner pool operates and provisions resources to perform scans.
What is Azure Purview Data Catalog?
This is a fully managed cloud service facilitated by Microsoft Purview that serves as a system of observation. It helps users to discover and understand the data they want to use and manage within their organization.
What is the purpose of the Azure Purview Data Map?
Azure Purview data map automatically creates a graph of the relationships between different data assets across their various locations. It provides insights into the lineage of data, such as where it’s coming from and where it’s going.
How does Azure Purview classify data?
Azure Purview uses pre-built classifiers that are designed to recognize specific patterns in the data. You can use these, or create your own custom classifiers based on rules that can match patterns or names.
What is the significance of Azure Purview’s data lineage feature?
The data lineage feature provides transparency into data processes. It provides insights into where the data originated, where it is currently, and where it is moving, allowing users to track changes and errors.
Is Azure Purview built to support GDPR and other compliance regulations?
Yes. Azure Purview was created to assist organizations in handling sensitive data in compliance with data regulations like GDPR, CCPA, and others. It allows businesses to automatically discover, classify and catalog sensitive data for easier data privacy and governance.
Can Azure Purview connect with on-premise data sources?
Yes. Azure Purview can connect with both cloud and on-premise data sources, including proprietary and third-party data sources.
Does Azure Purview use encryption to secure data?
Yes. Azure Purview uses Azure-managed keys for data encryption at rest.
How is the cost determined for Azure Purview?
The cost of Azure Purview is determined by the number of data assets scanned, the size of the storage used by the data catalog, and the amount of data map units consumed.
Can Azure Purview be integrated with other Azure services?
Yes, Azure Purview can seamlessly integrate with many Azure services including Azure Synapse Analytics, Azure Data Factory, Power BI and more.
Can Azure Purview be used for data privacy and protection?
Yes, Azure Purview helps organizations adhere to data privacy laws by classifying, cataloging and understanding the sensitivity of the data across their organization.
What role does Azure Purview play in data democratization?
Azure Purview helps companies democratize their data by making it readily accessible and understandable to non-technical users. It creates a unified data landscape that can be accessed and understood by a variety of roles within an organization.