Aggregate queries in Azure CosmosDb with large datasets
One of the services I’m building at one of my customers is an API that provides invoice information for a customer self-service portal. The invoice information is stored (of course) in Azure CosmosDb. Invoices are partitioned by customerid, but those partitions can still contain a lot of items.
When querying for the total outstanding amount for not payed invoices you can use an aggregate query:
SELECT SUM(c.OutstandingAmount) AS TotalOutstandingAmount FROM c WHERE c.Status <> 1
We executed the query with the following code:
var feedOptions = new FeedOptions
{
EnableCrossPartitionQuery = false,
PartitionKey = new PartitionKey(partitionKey)
};
var querySpec = new SqlQuerySpec()
{
QueryText = queryText,
Parameters = new SqlParameterCollection(queryParameters.Select(pair => new SqlParameter(pair.Key, pair.Value)))
};
using( var query = _documentClient.Value.CreateDocumentQuery<T>(collectionUri, querySpec, feedOptions).AsDocumentQuery())
{
var response = await query.ExecuteNextAsync<T>(cancellationToken);
}
return response.First();
But sometimes this returned an item with an amount set to 0 where I knew this should not be the case. When I ran the same code against a test database, the issue did not arise.
So I resorted to Fiddler to help me find the issue between the two queries.
First I tried running the query against the test database:
And then against the acceptation database:
As you can see, the latter returns a continuation token in the response through a response header. So this means we should continue asking for results in our code:
var items = new List<T>();
using (var query = _documentClient.Value.CreateDocumentQuery<T>(collectionUri, querySpec, feedOptions).AsDocumentQuery())
{
while (query.HasMoreResults)
{
var response = await query.ExecuteNextAsync<T>(cancellationToken);
items.AddRange(response);
}
}
return items;
Then you can create an aggregated result by using Linq to sum up the values of the returned items.