MongoEngine – Aggregation
”;
The term ‘aggregation’ is used for the operation that processes data and returns computed result. Finding sum, count and average on one or more fields of documents in a collection can be called as aggregation functions.
MongoEngine provides aggregate() function that encapsulates PyMongo’s aggregation framework. Aggregation operation uses a collection as input and returns one or more documents as a result.
MongoDB uses concept of data processing pipelines. A pipeline can have multiple stages. Basic stage provides that provide filter and operate like queries. Others provide tools for grouping and/or sorting by one or more fields, string concatenation tasks, array aggregation tools, etc.
Following stages are defined in MongoDB pipeline creation −
Name | Description |
---|---|
$project | Reshapes each document in the stream, by adding new fields or removing existing fields. |
$match | Filters the document stream to allow only matching documents to pass unmodified into the next stage. $match uses standard MongoDB queries. |
$redact | Reshapes each document by restricting the content for each document based on information stored in the documents themselves. |
$limit | Limits documents to be passed unmodified to the pipeline |
$skip | Skips the first n documents and passes the remaining documents unmodified to the pipeline. |
$group | Groups input documents by a given identifier expression and applies the accumulator expressions to each group. The output documents only contain the identifier field and accumulated fields. |
$sort | Reorders the document stream by a specified sort key. |
$out | Writes the resulting documents of the aggregation pipeline to a collection. |
Aggregation expressions use field path to access fields in the input documents. To specify a field path, use a string that prefixes with a dollar sign $$$ the field name. Expression can use one or more Boolean operators ($and, $or, $not) and comparison operators ($eq, $gt, $lt, $gte, $lte and $ne).
Following arithmetic expressions are also used for aggregation −
$add | Adds numbers to return the sum. Accepts any number of argument expressions |
$subtract | Returns the result of subtracting the second value from the first |
$multiply | Multiplies numbers to return the product. Accepts any number of argument expressions |
$divide | Returns the result of dividing the first number by the second. Accepts two argument expressions |
$mod | Returns the remainder of the first number divided by the second. Accepts two argument expressions |
Following string expression can also be used in aggregation −
$concat | Concatenates any number of strings |
$substr | Returns a substring of a string, starting at a specified index position up to a specified length |
$toLower | Converts a string to lowercase. Accepts a single argument expression |
$toUpper | Converts a string to uppercase. Accepts a single argument expression |
$strcasecmp | Performs string comparison and returns 0 if two strings are equivalent, 1 if first is greater than second, and -1 if first string is less than second |
To demonstrate how aggregate() function works in MongoEngine, let us first define a Document class called orders.
from mongoengine import * con=connect(''mydata'') class orders(Document): custID = StringField() amount= IntField() status = StringField()
We then add following documents in orders collection −
_id | custID | amount | status |
---|---|---|---|
ObjectId(“5eba52d975fa1e26d4ec01d0”) | A123 | 500 | A |
ObjectId(“5eba536775fa1e26d4ec01d1”) | A123 | 250 | A |
ObjectId(“5eba53b575fa1e26d4ec01d2”) | B212 | 200 | D |
ObjectId(“5eba540e75fa1e26d4ec01d3”) | B212 | 400 | A |
The aggregate() function is to be used to find sum of amount field for each custID only when status equals ‘A’. Accordingly, the pipeline is constructed as follows.
First stage in pipeline uses $match to filter documents with status=’A’. Second stage uses $group identifier to group documents on CustID and performs sum of amount.
pipeline = [ {"$match" : {"status" : "A"}}, {"$group": {"_id": "$custID", "total": {"$sum": "$amount"}}} ]
This pipeline is now used as argument to aggregate() function.
docs = orders.objects().aggregate(pipeline)
We can iterate over the document cursor with a for loop. The complete code is given below −
from mongoengine import * con=connect(''mydata'') class orders(Document): custID = StringField() amount= IntField() status = StringField() pipeline = [ {"$match" : {"status" : "A"}}, {"$group": {"_id": "$custID", "total": {"$sum": "$amount"}}} ] docs = orders.objects().aggregate(pipeline) for doc in docs: print (x)
For the given data, the following output is generated −
{''_id'': ''B212'', ''total'': 400} {''_id'': ''A123'', ''total'': 750}
”;