In most instances, related entities or data are usually linked together through foreign keys in the traditional relational database model. In Azure Cosmos DB, however, you have the flexibility to nest related entities directly inside their parent document.
Storing Multiple Related Entities in the Same Document
To develop a design by storing multiple related entities in the same document, we utilise a technique known as denormalization. This process places denormalized data into the Azure Cosmos DB document, making it a model for storing related entities together.
Consider a simple example of a fictitious e-commerce company, where a customer places multiple orders and each order might have multiple items. In a traditional relational database, we would store customers, orders, and items in separate tables and link these tables with foreign keys.
- Customer (CustomerID, Name, Address)
- Order (OrderID, CustomerID, Date)
- Item (ItemID, OrderID, Product, Quantity)
The above model can be denormalized and flattened into a single document in Azure Cosmos DB:
{
“CustomerID”: “C101”,
“Name”: “John Doe”,
“Address”: “123 Street, GOTown”,
“Orders”: [
{
“OrderID”: “O001”,
“Date”: “2021-12-01”,
“Items”: [
{
“ItemID”: “I101”,
“Product”: “Laptop”,
“Quantity”: 1
},
{
“ItemID”: “I102”,
“Product”: “Mouse”,
“Quantity”: 1
}
]
}
]
}
Benefits and Considerations
Storing related entities in the same document has several advantages:
- Enhances read performance: The document contains all the information required to read the data, eliminating the need for joining multiple tables, which can often be costly.
- Improves write performance: When updating a document, all changes are atomic at the document level. You only need to write a single document back to Cosmos DB.
The technique, however, comes with some constraints:
- The maximum size of a document in Azure Cosmos DB is 2 MB. If you are working with large amounts of data for related entities, you might need to split it into separate documents.
- The ability to query across different documents or different entities within the same document depends on how you have partitioned your data.
Conclusion
In conclusion, Azure Cosmos DB provides developers with new ways to design and manage their data. By storing multiple related entities in the same document, a seamless and efficient model for handling data can be obtained. However, you should carefully consider your application’s specific data access patterns and sizes when using this technique. Embracing these concepts will help you construct more effective Azure Cosmos DB solutions, thereby enabling you to maximise the full potential of this flexible tool to meet your application’s demands.
Practice Test
True/False: Storing multiple related entities in the same document is called denormalization.
- True
- False
Answer: True
Explanation: Denormalization is the process of storing multiple related entities in the same document to increase query efficiency.
True/False: The Azure Cosmos DB supports the storage of multiple related entities in the same document.
- True
- False
Answer: True
Explanation: The Azure Cosmos DB supports many data models, and that includes the storage of multiple related entities in the same document.
Which of the following considerations is NOT important when deciding to store multiple related entities in the same document in Azure Cosmos DB?
- a) Access patterns
- b) Data frequency
- c) Data consistency
- d) The color of the database
Answer: d) The color of the database
Explanation: The color of a database has no bearing on decisions about how data is stored.
True/False: Storing multiple related entities in the same document can result in faster queries.
- True
- False
Answer: True
Explanation: This approach may result in faster queries because related entities are fetched in a single retrieval operation.
In which of the following scenarios would you NOT store multiple related entities in the same document?
- a) You need atomic transactions across multiple documents
- b) You frequently read entire documents
- c) You often fetch related entities together
- d) You want to reduce the cost of read operations
Answer: a) You need atomic transactions across multiple documents
Explanation: An atomic transaction requires you to modify multiple documents independently. If the documents are denormalized, an atomic transaction across them is not feasible.
Which data modelling technique in Azure Cosmos DB allows storing multiple related entities in the same document?
- a) Normalization
- b) Denormalization
- c) Sanitization
- d) Serialization
Answer: b) Denormalization
Explanation: Denormalization is the data modelling technique that incorporates multiple related entities in the same document in Azure Cosmos DB.
True/False: It is mandatory to denormalize data when using Azure Cosmos DB.
- True
- False
Answer: False
Explanation: Denormalization is one approach to data modeling in Azure Cosmos DB. Depending on the requirements, data could also be normalized.
True/False: Denormalization in Azure Cosmos Db can contribute to data redundancy.
- True
- False
Answer: True
Explanation: Denormalization may lead to data redundancy as it involves duplicating related data across documents.
In Azure Cosmos DB, embedding related documents can be beneficial for which of the following?
- a) Managing system security
- b) Tailoring read optimization
- c) Managing database color schemes
- d) Optimizing write operations
Answer: b) Tailoring read optimization
Explanation: By embedding related documents, you optimize read operations as there is less need for joins or additional read operations.
What is less likely to be a consideration when embedding related documents in Azure Cosmos DB?
- a) Frequency of data access
- b) Size of the resulting document
- c) Color of the database
- d) Amount of related data
Answer: c) Color of the database
Explanation: When embedding related documents in Azure Cosmos DB, the color of the database is not a factor.
Which is the correct statement in relation to Azure Cosmos DB?
- a) Azure Cosmos DB does not support denormalization
- b) Denormalization is mandatory in Azure Cosmos DB
- c) Azure Cosmos DB supports denormalization but it is not mandatory.
Answer: c) Azure Cosmos DB supports denormalization but it is not mandatory.
Explanation: While denormalization can be beneficial for certain use-cases in Azure Cosmos DB, it is not a requirement.
True/False: An advantage of storing multiple related entities in the same document in Azure Cosmos DB is that it can reduce the number of required read operations.
- True
- False
Answer: True
Explanation: Combining related entities in the same document can result in fewer read operations, thus improving the efficiency of data retrieval.
True/False: Storing multiple related entities in the same document on Azure Cosmos DB can increase the complexity of the data structure.
- True
- False
Answer: True
Explanation: While there can be advantages, embedding multiple related entities in the same document can also result in a more complex data structure.
True/False: Storing multiple related entities in the same document cannot help in reducing the cross-partition queries in Azure Cosmos DB.
- True
- False
Answer: False
Explanation: Storing multiple related entities in the same document can reduce the cross-partition queries, thus improving the efficiency and performance.
Which of the following is TRUE when designing and implementing native applications using Microsoft Azure Cosmos DB?
- a) Denormalized data will result in slower queries
- b) Storing multiple related entities in the same document can reduce data redundancy
- c) Denormalization is the process of splitting related entities into separate documents
- d) Azure Cosmos DB supports storing multiple related entities in the same document
Answer: d) Azure Cosmos DB supports storing multiple related entities in the same document
Explanation: Azure Cosmos DB supports storing multiple related entities in the same document which is a technique known as denormalization.
Interview Questions
What is the main advantage of storing multiple related entities in the same document in Microsoft Azure Cosmos DB?
Azure Cosmos DB automatically indexes all data and does not require schema or secondary indexes, so, related data in the same document is beneficial for efficient data access and improves read performance.
Can different related entities with varying schemas be stored in the same document in Cosmos DB?
Yes, Cosmos DB is a schema-agnostic database and it can store related entities with varying schemas in the same document.
How does Azure Cosmos DB ensure data consistency when storing multiple related entities in the same document?
Azure Cosmos DB provides five levels of consistency models – Strong, Bounded staleness, Session, Consistent prefix, and Eventual. This allows the database to maintain the consistency of the data.
How are complex relationships handled in Azure Cosmos DB while storing multiple related entities in the same document?
Cosmos DB leverages hierarchy support in JSON model to handle complex relationships. Entities that have a 1:1 or 1:Few relationships can be embedded within a single document.
What strategy is often used to limit the need for cross partition queries in Azure Cosmos DB?
The strategy is to place related information into the same document and/or place related documents that will be queried together in the same partition.
What should be kept in mind with regards to the document size while storing multiple related entities in the same document in Azure Cosmos DB?
Each Azure Cosmos item or document can be up to 2 MB in size. So, the total size of the related entities embedded in a document should be within this limit.
Which query language does Azure Cosmos DB use for retrieving related entities stored in the same document?
Azure Cosmos DB uses SQL API, a SQL-like language for querying JSON documents.
Is there a way to model data stored in Azure Cosmos DB to optimize cost?
Yes, data modeling in Azure Cosmos DB should be done keeping in mind the Request Unit (RU) charge which is associated with each operation. Keeping related entities in the same document can optimize the cost as it reduces the number of operations.
Can Azure Cosmos DB handle data redundancy in documents where multiple related entities are stored together?
Yes, Azure Cosmos DB automatically handles data redundancy and provides high availability through global distribution.
If you have a high volume of related entities, is it advisable to store all of them in one single document?
Depending on the scenario, it might be less performant and harder to manage if you store too many related entities in a single document. In such a case, it’s advisable to make use of Cosmos DB’s partitioning feature for storing related entities.
What types of data relationships are most suitable for storing multiple related entities in the same document in Azure Cosmos DB?
The entities with a 1:1 or 1:Few relationships are a great candidate for this style of data modeling.
How do you handle concurrency when multiple clients are updating the multiple related entities in the same document?
Azure Cosmos DB ensures optimistic concurrency control by making use of entity tags (ETags), which help prevent clients from overwriting each other’s changes.
What happens if a single document’s total size exceeds Cosmos DB’s size limit when storing multiple related entities in the same document?
If a single document’s size exceeds 2 MB, you’ll need to split data across multiple documents or reduce the amount of data you’re storing in a document.
Does keeping multiple related entities together in the same document guarantee atomic transactions in Cosmos DB?
Keeping multiple related entities together in the same document does ensure atomicity for operations because the whole document is the unit of transaction in Cosmos DB.
How to decide between storing related entities in the same document and referencing between documents in Cosmos DB?
It depends on the type of relationships between entities. For one-to-few relationships, embedding in the same document is preferred. For one-to-squillions, referencing between documents is more suitable.