The process of identifying data includes knowing what kind of data you are dealing with and understanding the structure, while the access patterns are essentially the ways in which this data is accessed. A deep understanding of these two aspects is vital for passing the DP-420 exam.
Data Identification in Azure Cosmos DB
The primary step towards fully leveraging Azure Cosmos DB capabilities is identifying the type of data that will be stored and operated on. In Azure Cosmos DB, there are two types of data models: SQL API and MongoDB API.
The SQL API uses a document model. It’s perfect for applications that require a flexible schema and hierarchical data. The document data model also supports semi-structured and hierarchical data and is indexed by default. Here’s an example of a SQL API Document:
{
“id”: “001”,
“name”: “John Doe”,
“address”: {
“street”: “123 Main St”,
“city”: “Anytown”,
“state”: “CA”,
“zip”: “12345”
},
“orders”: [
{
“id”: “1001”,
“date”: “1/1/2020”,
“product”: “Widget X”
},
…
]
}
The MongoDB API uses a BSON data model, similar to JSON, that supports rich data types. This is ideal for applications leveraging MongoDB and wanting to benefit from Azure Cosmos DB’s scalability and performance.
Understanding your data’s structure and type is essential for knowing how to partition data effectively, how to model and build relationships between data, and how to query data efficiently.
Identifying Access Patterns
Access patterns reveal how your application reads, writes, updates or deletes data. They are highly significant to any cloud-based solution design, specifically to Azure Cosmos DB, as they influence how data is partitioned, replicated, and transferred over networks.
Azure Cosmos DB enables applications to have predictable, single-digit millisecond latency at P99, therefore the access pattern must be thoroughly analyzed and designed for maximum system performance.
In defining access patterns for a Cosmos DB, the following considerations are vital:
- Understand the frequency of reads and writes: Data that is read more frequently than it is written would be handled differently from data that is mostly written, rarely read.
- Understand the most common queries: Determining the queries that will be called most often can help in designing the access patterns for speedy data retrieval.
- Know the consistency requirements: Cosmos DB offers five consistency models. Identifying access patterns can help choose the most suitable model to generate a correct application response.
You can identify the access patterns by documenting all the CRUD operations (i.e., Create, Read, Update, Delete) your application performs, and determine the volume, velocity, and variety of reads and writes performed by your application.
For example, a Customer Relationship Management (CRM) application may have the following access patterns:
Access Pattern | Operation |
---|---|
Create a new customer | Write |
Update customer information | Write |
Search for a customer | Read |
Delete customer record | Delete |
Once you have an understanding of the data and their access patterns, you can model and partition your database to achieve the high scalability, availability, and performance offered by Azure Cosmos DB.
Understanding data and associated access patterns are fundamental aspects of designing and implementing native applications using Azure Cosmos DB, which are central topics in the DP-420: Designing and Implementing Microsoft Azure Cosmos DB Solutions exam. So it is essential to spend time on these topics as you prepare.
Practice Test
True or False: Cosmos DB supports geo-redundancy?
- True
- False
Answer: True
Explanation: Azure Cosmos DB allows for configuring geo-redundancy, which means it can replicate data across multiple regions for fast access around the globe.
In a partition key within Cosmos DB, you can include as many properties as you want.
- True
- False
Answer: False
Explanation: A partition key in Cosmos DB can have only one property. This is a significant design decision as it directly affects the performance of the Cosmos DB.
In Cosmos DB, the partition key selection is essential for scaling and performance.
- True
- False
Answer: True
Explanation: The partition key in Cosmos DB is critically important for performance and scale, as it governs how data is distributed across physical partitions.
What are the primary APIs offered by Azure Cosmos DB? (Select all that apply)
- SQL API
- MongoDB API
- Cassandra API
- Players API
Answer: SQL API, MongoDB API, Cassandra API
Explanation: Azure Cosmos DB provides several APIs for accessing your data including SQL (for document data model), Mongo DB(counterpart for MongoDB), and Cassandra (for wide column data model).
True or False: Azure Cosmos DB guarantees less than 10 ms latencies on reads and writes at the 99th percentile worldwide?
- True
- False
Answer: True
Explanation: Azure Cosmos DB guarantees single-digit-millisecond read and write latencies at the 99th percentile worldwide, which greatly enhances the user experience.
True or False: You cannot set the throughput on a per-container basis in Azure Cosmos DB.
- True
- False
Answer: False
Explanation: You can provision throughput on an Azure Cosmos container or database, which allows flexibility in distributing your resources.
What does RUs in Azure Cosmos DB stands for?
- Really Useful
- Request Units
- Read Units
- Requirement Units
Answer: Request Units
Explanation: In Azure Cosmos DB, RUs (Request Units) are the measure of throughput. They measure the resources required to perform read or write operations.
Azure Cosmos DB does not support multiple data models.
- True
- False
Answer: False
Explanation: Azure Cosmos DB is a multi-model database service. It natively supports multiple data models, including document, graph, key-value, table, and column-family data models.
True or False: Optimistic concurrency doesn’t help in overcoming the challenge of concurrent access patterns.
- True
- False
Answer: False
Explanation: Optimistic concurrency is a strategy that ensures the data’s consistency in high concurrency scenarios, by checking whether the data has changed before it is updated.
Which consistency option in Azure Cosmos DB provides a balance between consistency and performance?
- Strong
- Eventual
- Bounded staleness
- Session
Answer: Session
Explanation: Session consistency provides a balance between strong and eventual consistency. It ensures consistency within a single user session, showing high performance and optimal user experience.
Azure Cosmos DB supports real-time analytics.
- True
- False
Answer: True
Explanation: Yes, Azure Cosmos DB has a built-in integration with Azure Synapse Analytics that allows you to perform real-time analytics on your operational data.
The partition key in Cosmos DB is stored as plain text without any encryption.
- True
- False
Answer: False
Explanation: Azure Cosmos DB automatically encrypts all data at rest and in motion to ensure the highest level of data protection.
Interview Questions
What does Microsoft Azure Cosmos DB support in terms of consistency levels?
Microsoft Azure Cosmos DB supports five types of consistency levels: Strong, Bounded staleness, Session, Consistent prefix, and Eventual.
What does the term “partitioning” mean in the context of Azure Cosmos DB?
Partitioning in Azure Cosmos DB refers to the process of breaking down data into smaller, more manageable parts, called partitions. This aids in achieving high scalability and performance.
What is the role of a partition key in Azure Cosmos DB?
The partition key in Azure Cosmos DB is responsible for distributing data across multiple partitions. It ensures that there is an even distribution of data and that the system efficiently manages and serves the data.
What tool or service in Azure Cosmos DB can you use to export, import, or copy data?
Azure Cosmos DB Data Migration tool is used to export, import, or copy data.
In Azure Cosmos DB, what does a “change feed” provide?
A change feed in Azure Cosmos DB provides a sorted list of documents within a container in the order in which they were modified. It enables users to track changes in real time.
What is a Request Unit (RU) in the context of Azure Cosmos DB?
A Request Unit (RU) in Azure Cosmos DB denotes the cost of resources such as memory, CPU, and I/O needed for read, write, or query operations.
Can the partition key in Azure Cosmos DB be changed after data is already loaded in the database?
No, in Azure Cosmos DB, the partition key cannot be changed once it is set and data is loaded.
How is consistency maintained in Cosmos DB when data is read from a secondary region?
In Cosmos DB, multi-region replication is used to replicate data across different geographical regions, and consistency is maintained using five consistency models, including: Strong, Bounded staleness, Session, Consistent prefix, and Eventual.
What does the term “provisioned throughput” mean in Azure Cosmos DB?
Provisioned throughput in Azure Cosmos DB refers to the performance capacity, measured in Request Units, that is allocated and billed for containers or databases.
What is the role of Time to Live (TTL) in Azure Cosmos DB?
TTL in Azure Cosmos DB denotes the time after which the data is automatically deleted from the database. It helps in removing older data automatically from the system.
How is data consistency maintained during writes in Azure Cosmos DB?
Data consistency during writes in Azure Cosmos DB is maintained using write replicas. Each write is committed to all replicas of the data before the write is acknowledged as successful.
What is the function of Indexing Policy in Azure Cosmos DB?
Indexing Policy in Azure Cosmos DB defines which attributes in the data to index and also determines the order of the index entries.
What are the data models supported by Azure Cosmos DB?
Azure Cosmos DB supports document data model with SQL (JSON) and MongoDB APIs, graph data model with Gremlin API, key-value data model with Azure Table API, and column-family data model with Cassandra API.
Can you perform ACID transactions in Azure Cosmos DB?
Yes, with Cosmos DB’s support of multi-document transactions using stored procedures, triggers, and batches, you can perform ACID transactions.
What is the main advantage of using Azure Cosmos DB in a globally distributed application?
The primary advantage is that Azure Cosmos DB provides low latency, high availability and consistent data access, regardless of the geographical location of the client.