Book a Demo Free Trial

Azure Cosmos DB SQL API – Document Analytics

Ranjith Eswaran

Jun 15, 2021

Category: Azure Cosmos DB

Introduction

In this blog, we will understand the basics of Cosmos DB partitions and the use of Cerebrata Cerulean in visualizing the data stored in Cosmos DB SQL API containers based on the partition keys.

Logical Partitions

A logical partition consists of a set of items that have the same partition key. For example, in a container that contains data about customer addresses, all items contain a state property. You can use state as the partition key for the container. Groups of items that have specific values for state form distinct logical partitions. When new items are added to a container, new logical partitions are automatically created by the system.

There is no limit to the number of logical partitions in your container. Each logical partition can store up to 20 GB of data. Good partition key choices have a wide range of possible values.

Physical Partitions

A container is scaled by distributing data and throughput across physical partitions. One or more logical partitions are mapped to a single physical partition internally. Typically, smaller containers have many logical partitions, but they only require a single physical partition. Unlike logical partitions, physical partitions are an internal implementation of the system, and they are entirely managed by Azure Cosmos DB.

The number of physical partitions in container depends on the following:

  • The number of throughput provisioned (each individual physical partition can provide a throughput of up to 10,000 request units per second). The 10,000 RU/s limit for physical partitions implies that logical partitions also have a 10,000 RU/s limit, as each logical partition is only mapped to one physical partition.
  • The total data storage (each individual physical partition can store up to 50GB of data).

Choosing a Partition key

There is no limit for the number of logical partitions in a container. Each logical partition can store up to 20GB of data. It is not possible to insert more data into the same logical partition once the 20 GB limit is reached. Let us consider a scenario, where we need to store the address of all the customers in a Cosmos DB SQL API container. The sample document to be stored will be of the following format.

{

“id”:”CS001”,

“firstName”:  “John”,

“lastName: ”Smith”,

“addressLine1”: ”132, My Street”,

“addressLine2”: ” Kingston”,

“state”:  ”New York”,

“zipCode”: “12401”,

“country”:”United States”

}

The partition Key path chosen for this container is “/state”. This will hold good when the population of the state is small or normal. But when the population of the state is large, then we will have a greater number of records with the same partition key and the size of the logical partition will grow enormously and may reach the 20 GB limit.

So, it is very important to note that the partition key must be chosen in such a way that it holds a wide range of values. So, in the above case, it will be good if the partition Key chosen is zipcode.

Document analytics in Cerebrata Cerulean

In Cerebrata Cerulean, it is possible to visualize the data stored in each logical partition within a container. This is available through the document analytics feature. Through this feature, we will be able to view the total size of the documents in each logical partition. This will help us in understanding the trend of the records that are stored in each logical partition.

Steps in viewing the Cosmos DB SQL API document analytics

Navigate into the required Cosmos DB SQL API container within the desired database

Select Document analytics from the context menu of the container.

The document analytics will be displayed in both graphical format and tabular format.

Capabilities in Document analytics

Search Criteria: It is possible to retrieve the documents that are modified within the provided data time interval. This can be done be selecting the Search Criteria option.

Document count: It is possible to visualize the total number of documents in a logical partition by grouping the records by document count.

Document size: It is possible to visualize the total size of documents in a logical partition by grouping the records by document size.

Features in Document Analytics

Export Chart: It is also possible to export the document analytics as an image file. This can be done through the Export chart option.

Export data: It is also possible to export the document analytics as CSV or Excel file. This can be done by using the Export data option.

Other features for Cosmos DB SQL API

Cerebrata Cerulean also provides some other interesting features for Cosmos DB like management of databases, containers, stored procedures, functions, and triggers of containers in a Cosmos DB account. It is also possible to query the documents in a container, insert a new document, update, and delete the existing documents. It is also possible to copy the documents from one container to another container.

Conclusion

In this blog, we understood how Cerebrata Cerulean can be used to visualize the logical partitions in a Cosmos DB SQL API container and the basics of logical and physical partitions. Apart from Azure Cosmos DB SQL API, Cerebrata Cerulean enables you to manage your Azure Cosmos DB accounts (Gremlin API and Table API), Service Bus Namespaces, Cognitive Search Service accounts, Redis Cache accounts, and much more. It is also cross-platform so that you can manage your Azure resources from a platform of your choice – Windows, Mac, or Linux. Please visit https://www.cerebrata.com to learn more.