Azure Cosmos DB uses partitioning to scale individual containers in a database to meet the performance needs of an application. A partition key is a property in the items of a container. All items with the same partition key value are stored together physically. A synthetic partition key is a partition that is composed using various attributes of the document.
When the available partition keys provided by data are not adequate, a common strategy is to create a synthetic partition key that can result in better logical partitioning for an application. The most common reason for creating a synthetic partition key is to enable many-to-many relationships.
Why to Implement Synthetic Partition Key?
A well-chosen partition key is vital for an application’s performance and scalability. It can help you distribute your workload evenly across different logical partitions. If an application’s data isn’t evenly distributed across partitions, the application’s throughput might become throttled or can create skew, limiting the capacity to scale.
In such a case, employing a synthetic partition key may be beneficial. By combining different attributes into a single synthetic partition key, it ensures each partition contains a nearly equal number of documents to enable uniform distribution of the data.
Constructing and Implementing a Synthetic Partition Key
There isn’t a defined rule for creating a synthetic key. It largely depends on a case-by-case basis and the specifics of the application and data being used. However, primary considerations should aim for an equal distribution of storage and throughput across all partitions.
Here’s an example to illustrate how to make and use a synthetic partition key.
Consider a music application where data is stored in Azure Cosmos DB. It contains a ‘Songs’ container with SongID and Genre as two properties. A traditional approach might use SongID as the partition key. Still, if you want to query all songs from a specific genre, the query operation would become a cross-partition query, which might lead to higher cost and slower operation.
A synthetic partition key can be formed by concatenating SongID and Genre.
{
“id”:”1″,
“SongID”:”Song1″,
“Genre”:”Pop”,
“syntheticKey”:”Pop-Song1″
}
Here, “syntheticKey” is the concatenation of Genre and SongID that is constructed as a synthetic partition key. All subsequent partition based operations and querying can now be performed using ‘syntheticKey’.
Conclusion
The proper implementation of synthetic partition keys can significantly favor performance optimization and efficiency in Cosmos DB. By ensuring that data distribution remains even across all partitions, it mitigates the issues of throughput throttling, increase in latencies and poor resource utilization.
Although it is not always necessary, inspecting the nature of your workload and the queries you use and accordingly deciding on the creation of the synthetic partition keys can dramatically aid in scaling and partitioning.
In the DP-420 Designing and Implementing Native Applications Using Microsoft Azure Cosmos DB exam, the understanding of Cosmos DB’s internal working, including management of partition keys, forms an essential part. Therefore, mastering the use and implementation of synthetic partition keys can be beneficial.
Practice Test
True or False: A synthetic partition key is a key that comprises a combination of two or more properties within an item.
- Answer: True
Explanation: A synthetic partition key is composed of one or more property values that are concatenated into a single one.
What is the main advantage of using synthetic partition keys in Cosmos DB?
- A. Improve the query performance
- B. Spread data and traffic evenly across partitions
- C. All of the above
Answer: C. All of the above
Explanation: Using synthetic partition keys can both improve the overall query performance and distribute data and traffic more evenly across the partitions.
True or False: A synthetic partition key helps to limit transaction scope in Cosmos DB.
- Answer: True
Explanation: Transactions in Cosmos DB are scoped to a single partition key. Thus, a synthetic partition key can effectively limit the transaction scope.
When is using a synthetic partition key recommended in Cosmos DB?
- A. When there are large amounts of data
- B. When partition key values are evenly distributed
- C. When there’s need for cross-partition queries
- D. When partition key values are unevenly distributed
Answer: D. When partition key values are unevenly distributed
Explanation: Synthetic partition keys can be particularly useful when partition key values are unevenly distributed, as they effectively distribute traffic across all partitions.
What is the maximum storage limit per partition key value in Cosmos DB?
- A. 10 GB
- B. 20 GB
- C. 10 TB
- D. Unlimited
Answer: B. 20 GB
Explanation: Each partition key value has a provisioned throughput and a storage limit of 20 GB.
Which of the following is not a consideration while constructing and implementing synthetic partition key?
- A. Data distribution
- B. Query patterns
- C. Latency rate
- D. Frequency of database access
Answer: D. Frequency of database access
Explanation: Considerations for creating synthetic partition keys do not generally include database access frequency, but focus more on data distribution, query patterns, and latency.
True or False: One disadvantage of a synthetic partition key is that it can increase overall storage costs.
- Answer: False
Explanation: Partition keys do not add significantly to storage costs. They primarily improve data accessibility and query performance.
Which API should be used to interact with Cosmos DB and use synthetic partition keys?
- A. SQL API
- B. Mongo DB API
- C. Azure Table API
- D. All of the above
Answer: D. All of the above
Explanation: Cosmos DB supports SQL API, MongoDB API, Azure Table API, and more. Synthetic partition keys can be used with all these APIs.
Can you manually override the partition key on an item in Cosmos DB?
- A. Yes
- B. No
Answer: A. Yes
Explanation: While Cosmos DB automatically sets a partition key, users can manually override this on an individual item basis.
True or False: Synthetic partition keys can help overcome the limitation of the maximum throughput provisioned for a single partition key value.
- Answer: True
Explanation: By distributing data and read/write operations across multiple partition key values, synthetic partition keys can help to handle a higher overall throughput.
Interview Questions
What is a synthetic partition key in Azure Cosmos DB?
A synthetic partition key in Azure Cosmos DB is a type of partition key created as a combination of two or more property values in a JSON document instead of using a single property value as partition key.
How does a synthetic partition key benefit data distribution?
A synthetic partition key provides a more granular control over data distribution, which can lead to a more even distribution of data across partitions, increasing read and write scalability.
How does Cosmos DB handle synthetic partition keys in the sharding process?
The Cosmos DB uses the synthetic partition key to distribute data across multiple partitions, ensuring a balance in data distribution, which improves read and write performance.
How do you create a synthetic partition key in Cosmos DB?
You can create a synthetic partition key by concatenating two or more property values into one in the JSON document. You will then define this combined value as the partition key when you create a container.
Is it mandatory to use a synthetic partition key in Cosmos DB?
No, it’s not mandatory to use a synthetic partition key. It’s a recommended approach if the natural partition key options do not distribute data evenly among partitions.
When can a synthetic partition key be particularly useful?
A synthetic partition key can be particularly useful when dealing with large volumes of data that need to be distributed across numerous partitions for optimal performance.
What is the maximum size of a partition key value?
The maximum size of a partition key value is 2 KB.
What is the impact of a poorly chosen partition key in Cosmos DB?
A poorly chosen partition key can lead to imbalanced distribution of data, leading to hot partitions. This will eventually limit the scalability and performance of the Cosmos DB.
What data types are allowed for partition keys in Cosmos DB?
Azure Cosmos DB natively supports the following data types for partition keys: string, number, and Boolean.
Does the order of values in a synthetic partition key matter?
Yes, the order establishes the hierarchy for partitioning and range queries. The leading value acts as a coarse-grained partition key, whereas the subsequent values act as fine-grained partition keys.
How many characters can a single partition key in Cosmos DB handle?
A single partition key in Cosmos DB can handle up to 100 characters.
How does Cosmos DB guarantee uniqueness of key with synthetic partition key?
Cosmos DB guarantees uniqueness by combining the partition key and the id together to form an index. Even with synthetic partition keys, as long as the combination of partition key and id is unique, Cosmos DB will maintain unique keys.
Can you change the partition key value after it has been created?
No, the partition key value is immutable once it has been created.
What does Cosmos DB do when a partition becomes full?
When a partition becomes full, Cosmos DB automatically splits the partition, creating an additional partition to handle the new data.
Can you delete a partition key in Cosmos DB?
No, once a partition key is set, it cannot be removed or changed. The container must be deleted and recreated with a new partition key.