Microsoft Purview Data Catalog is an enterprise data management solution that assists organizations in understanding their data landscape. It offers a unified data governance service which allows users to discover, understand, govern, and consume data sources efficiently. Operating on Azure, one of its powerful features is metadata management; it allows you to browse and search metadata efficiently.
Understanding Metadata in Microsoft Purview Data Catalog
Metadata is data about data. It is this information that Purview Data Catalog uses to help organize, manage, and discover the relevance of data sources across your business. Types of metadata within the catalog include:
- Structured data such as database tables and columns,
- Semi-structured data like CSV or Excel file formats, and
- Unstructured data such as Word documents or PDF files.
Metadata scans executed by Purview reveal information like data sensitivity, data lineage, and data relationships.
Browsing Metadata in Microsoft Purview Data Catalog
Microsoft Purview offers a unique ability to browse metadata assets in the data catalog. The browsing process is like navigating through a hierarchy where one can explore available data sources, collections, or assets. Collections are groupings of assets that are organized according to a business need or function.
Here’s an example of how you navigate the hierarchy:
- Select your chosen data map.
- Choose between either ‘Sources’, where you can see physical locations of scanned data or, ‘Collections’, which contains the grouped data assets.
- Click on a data source or a collection.
The result will display a summary of the source or collection including a description, last scan date, and total number of assets. You can click on any asset to explore it further.
Searching Metadata in Purview Data Catalog
Microsoft Purview’s search functionality goes beyond simple keyword searches. It offers faceted search, which allows the filtering of results based on specific criteria like sensitivity labels, file formats, annotations, classifications, schema names, and data sources.
To search for metadata in Purview Data Catalog, follow these steps:
- Use the Search bar located at the top of the Purview Studio page.
- Enter your query, it could be a term, phrase, or specific keyword related to the data you’re seeking.
- Click the ‘Search’ button or press ‘Enter’ on your keyboard to initiate the search.
The search results will display various metadata assets related to your query. They could be tables, files, views, or any content with a tag matching the entered query. It also shows their descriptions, classifications, sensitivity labels, and the data source of the asset.
Conclusion
Through Microsoft Purview’s Data Catalog, you can browse and search metadata effectively, thereby enhancing your organization’s ability to understand, manage, and use data assets. As all these functionalities are in the cloud, the risk of data loss is greatly reduced. Whether you are preparing for the DP-203 Data Engineering on Microsoft Azure exam or simply looking to advance your data management capabilities, understanding how to browse and search metadata in Microsoft Purview Data Catalog should be a key part of your strategy.
Practice Test
True or False? In Microsoft Purview data catalog, you can set your metadata’s visibility.
- True
- False
Answer: True
Explanation: Microsoft Purview data catalog allows you to control who can view your metadata by setting its visibility, hence this statement is True.
When you create a data source in Microsoft Purview Data Catalog, it automatically scans metadata. Is this statement true or False?
- True
- False
Answer: True
Explanation: When you create a data source in Microsoft Purview data catalog, it automatically scans and classifies your metadata.
What can you do with Microsoft Purview data catalog? (Multiple select)
- A. Search metadata
- B. Browse metadata
- C. Edit metadata
- D. Delete metadata
Answer: A, B
Explanation: Microsoft Purview data catalog allows you to browse and search metadata, however, it does not allow editing or deleting metadata.
The Microsoft Purview data catalog supports data sensitivity labeling. Is this true or false?
- True
- False
Answer: True
Explanation: Microsoft Purview data catalog supports data sensitivity labeling, which helps businesses to safeguard their data.
Which of the following can be considered as an asset type in Microsoft Purview data catalog? (Single Select)
- A. Databases
- B. Spreadsheets
- C. Files
- D. All of the above
Answer: D. All of the above
Explanation: Microsoft Purview data catalog considers databases, files, tables, and views as asset types.
True / False. In Microsoft Purview’s Data Catalog, the classification of data is a manual process.
- True
- False
Answer: False
Explanation: The Microsoft Purview data catalog automatically classifies data based on more than 100 built-in classifiers.
What search query language does Microsoft Purview Data Catalog use?
- A. Apache Lucene
- B. Google BigQuery
- C. Microsoft SQL
- D. MongoDB
Answer: A. Apache Lucene
Explanation: Microsoft Purview Data Catalog uses Apache Lucene as the search query language.
Which of the following data sources can be scanned by Microsoft Purview data catalog? (Multiple select)
- A. Azure SQL Database
- B. Amazon Redshift
- C. Oracle on-premise database
- D. Microsoft Access file
Answer: A, C
Explanation: Microsoft Purview supports a wide range of data sources, including Azure SQL Database and Oracle on-premise database, but it does not support Amazon Redshift and Microsoft Access file.
Microsoft Purview Data Catalog automatically maintains an up-to-date view of your data landscape. True or False?
- True
- False
Answer: True
Explanation: Microsoft Purview Data Catalog automatically scans and catalogs data from various data sources, maintaining an up-to-date view of your data landscape.
You can also use Azure Purview to understand data lineage. Is it True or False?
- True
- False
Answer: True
Explanation: Azure Purview also provides data lineage capabilities, helping organizations to understand where their data comes from, where it goes, and how it changes over time.
Interview Questions
What is Microsoft Purview Data Catalog?
Microsoft Purview Data Catalog is a fully managed cloud service aimed at maximizing business value from data. It assists organizations in tracking their data landscape by providing a multi-tenant, cloud-based service that can classify, manage and discover data across a wide range of sources.
How does Purview help in managing metadata?
Purview enables you to register and map your data sources. As data sources are scanned, metadata is ingested into the system. It helps in understanding the data landscape across the organization and allows you to organize, discover and access metadata efficiently.
How can you conduct a search in Purview?
You can search in Purview using the search bar located in the Purview Studio. Users can input search terms, labels, classifications and use filters to refine the search.
Are Purview’s search results affected by access permissions?
Yes, Purview only shows search results for assets that the user has permission to access. This way, sensitive information is protected even within the search environment.
What is the purpose of the Data Map in Microsoft Purview?
Data Map serves as the indexing layer for Purview. It scans and catalogs data from various sources and maintains a graph of assets, classifications, labels, and other properties, making it easier for users and systems to find and understand the data they are looking for.
Does Microsoft Purview support classification of data?
Yes, Purview supports automatic classification of data using more than 100 built-in classifiers. You can also create custom classifiers according to your organization’s needs.
Can you add custom glossary terms in Microsoft Purview?
Yes, you can add custom glossary terms in Microsoft Purview. These terms can be associated with assets, labels, classifications, and can help improve search and understanding of your data assets.
What kind of data sources can be scanned by Purview?
Purview supports a wide range of data sources including Azure Blob Storage, Azure Data Lake Storage, Azure SQL Database, Azure Synapse Analytics, and SQL Server, amongst others.
How can I setup a scan in Microsoft Purview?
In the Purview Studio, you can configure a new scan by selecting a data source, setting up the scan rules, determining the schedule and then finally reviewing and creating the scan.
How does Purview help meet compliance requirements?
Purview helps meet compliance requirements by providing a detailed understanding of your data landscape. It allows you to identify sensitive data and manage access to it, enabling you to create policies to protect such data and ultimately comply with regulations.
Can I export data from Microsoft Purview?
Yes, you can export metadata from Microsoft Purview in the CSV format. The exported file includes details about the assets stored in your Purview account, such as asset name, parent/child asset relationships, classifications, glossary terms, etc.
How is Microsoft Purview different from Azure Data Catalog?
Microsoft Purview is essentially an evolved version of Azure Data Catalog and comes with additional features. While Azure Data Catalog mainly focused on cataloging capabilities in Azure, Purview extends its data governance capabilities across all data sources irrespective of location—on-premises, multi-cloud, or software-as-a-service (SaaS).
Can Microsoft Purview integrate with other Azure Services?
Yes, Purview can not only map and catalog data from various Azure services, but it can also integrate with services like Azure Synapse Analytics, to provide unified data governance and advanced analytics capabilities.
Can I use APIs with Microsoft Purview?
Yes, Purview provides REST APIs that can be used to interact with the service programmatically, offering functionalities such as data catalog management, entity search, data lineage and more.
Can you access Microsoft Purview Data Catalog from mobile devices?
Currently, Microsoft Purview Data Catalog can be accessed from desktop browsers, but not from mobile devices.