Using NEST for ElasticSearch aggregate query

ElasticSearch aggregation

Aggregation in Elasticsearch is a mechanism for extracting and displaying data in a structured way. Think of it as the GROUP BY statement in SQL, but more powerful and flexible.

Aggregation type

Elasticsearch supports many types of aggregations, including:

  1. Metrics aggregations: Mainly used for statistics of indicators such as maximum value, minimum value, average value, and sum of fields. This type of aggregation calculates based on the numeric value of a document field and returns a single numeric result. For example, maximum value (max), minimum value (min), average value (average), sum (sum), statistical information (stats, including the above operations), and other complex aggregations such as percentages (percentiles), cardinality ( cardinality) etc.
  2. Bucket aggregations: Analogous to the role of group by in SQL, it is mainly used to count the number of different types of data. This type of aggregation creates a set of buckets, each bucket corresponding to a specific condition or range, and then documents are classified into the corresponding buckets based on these conditions or ranges. Common ones include range, date range, histogram, date histogram, geohash grid, etc.
  3. Pipeline aggregations: used for secondary aggregation of aggregation results. If you want to count the buckets with the largest number of bound tags, you must first bucket them according to the tags, and then calculate the maximum value on the bucketed results. Such aggregations can perform secondary calculations based on the results of other aggregations. For example, calculate differences, ratios, moving averages, etc.

Elasticsearch’s aggregation operation supports nesting, that is, an aggregation can contain other sub-aggregations, thereby realizing very complex data mining and statistical requirements.

Aggregation query syntax

{
  "size" : 0,
  "aggregations" : {
    "Self-named aggregate name 1" : {
      "filter" : {
        "bool" : {
          "filter" : [
            {
              "term" : {
                "Query field 1" : {
                  "value": Query value 1,
                  "boost" : 1.0
                }
              }
            }
          ]
        }
      },
      "aggregations" : {
        "Self-named aggregate name 2" : {
          "sum" : {
            "field" : "query value 2"
          }
        }
      }
    }
  }
} 

Write Aggregation

Nest provides 3 ways for you to use aggregation:

  • By means of lambda expressions.
  • Through the built-in request object AggregationDictionary.
  • Simplify the use of AggregationDictionary by combining binary operators.

Suppose we have the following Project class:

public class Project
{
    public string Name { get; set; }
    public int Quantity { get; set; }
}

The request commands in the three ways are shown below:

POST /project/_search?typed_keys=true
{
    "aggs": { //Keyword aggregations, can be abbreviated with aggs
        "average_quantity": { //The name of the aggregate
            "avg": { //The type of aggregation can be understood as equivalent to the aggregation function in sql server
                "field": "quantity" //Aggregation, which fields are aggregated
            }
        },
        "max_quantity": {
            "max": {
                "field": "quantity"
            }
        },
        "min_quantity": {
            "min": {
                "field": "quantity"
            }
        }
    }
}

lambda-method

Using aggregations via lambda expressions is a concise way

var searchResponse = _client.Search<Project>(s => s
    .Aggregations(aggs => aggs
        .Average("average_quantity", avg => avg.Field(p => p.Quantity))
        .Max("max_quantity", avg => avg.Field(p => p.Quantity))
        .Min("min_quantity", avg => avg.Field(p => p.Quantity))
    )
);

Generally, when performing aggregation queries, _source is not needed, so when performing aggregation queries, you can specify size=0 in the query statement, so that only aggregated results will be returned, as follows:

var searchResponse = _client.Search<Project>(s => s
    .Size(0) //Explicitly specified as 0
    .Aggregations(aggs => aggs
        .Average("average_quantity", avg => avg.Field(p => p.Quantity))
        .Max("max_quantity", avg => avg.Field(p => p.Quantity))
        .Min("min_quantity", avg => avg.Field(p => p.Quantity))
    )
);

By built-in object AggregationDictionary

The following code has the same effect as passing a lambda expression

var searchRequest = new SearchRequest<Project>
{
    Size = 0,
    Aggregations = new AggregationDictionary
    {
        {"average_quantity", new AverageAggregation("average_quantity", "quantity")},
        {"max_quantity", new MaxAggregation("max_quantity", "quantity")},
        {"min_quantity", new MinAggregation("min_quantity", "quantity")},
    }
};
var searchResponse = _client.Search<Project>(searchRequest);

Simplify the use of AggregationDictionary by combining binary operators

Binary operators can make the code more readable. The following code is equivalent to the above:

var searchRequest = new SearchRequest<Project>
{
    Size = 0,
    Aggregations = new AverageAggregation("average_quantity", "quantity")
     & amp; & amp;new MaxAggregation("max_quantity", "quantity")
     & amp; & amp;new MinAggregation("min_quantity", "quantity")
};
var searchResponse = _client.Search<Project>(searchRequest);

Practice of querying aggregate values

Multiple aggregation condition queries

When using NEST for aggregation queries in C#, you can use the AggregationContainer class to add multiple aggregation conditions.

First, you need to create an AggregationContainer object and add the aggregation conditions you need. For example, if you want to query the total sales volume and average price of items in an index, you can write the code like this:

var searchResponse = client.Search<Product>(s => s
    .Aggregations(a => a
        .Sum("total_sales", st => st
            .Field(f => f.Sales)
        )
        .Average("average_price", ap => ap
            .Field(f => f.Price)
        )
    )
);

In the above code, we create an AggregationContainer object and use the Sum and Average methods to add two aggregation conditions. The Sum method is used to calculate the total sales volume, and the Average method is used to calculate the average price. At the same time, we also specified the corresponding fields, Sales and Price.

Finally, we use the client.Search method to execute the query, and the query results will include aggregated results.

Please note that the above example uses a class named Product to represent the product. You need to replace it according to your own data structure. Also, you need to replace the Sales and Price fields in the example with their actual field names.

Aggregation query of multiple filter conditions

To use NEST to write C# code to generate the above Elasticsearch aggregation query statement, you can use the Filters method of the AggregationContainer class to add multiple filter conditions.

{"aggregations": {
"group": {
"filters": {
"filters": {
"pendingCount": {
"match": {
"dataBody.status": 0
}
},
"storedingCount": {
"match": {
"dataBody.status": 1
}
},
"finishedCount": {
"match": {
"dataBody.status": 2
}
},
"closedCount": {
"match": {
"dataBody.status": 3
}
},
"thisMonthCount": {
"range": {
"dataBody.creationTime": {
"gte": "2023-11-03"
}
}
}
}
}
}
}

The following is a sample code showing how to use NEST to generate the above aggregation query statement:

var searchResponse = client.Search<Document>(s => s
    .Aggregations(a => a
        .Filters("group", f => f
            .NamedFilters(flt => flt
                .Filter("pendingCount", q => q
                    .Match(m => m
                        .Field("dataBody.status")
                        .Query("0")
                    )
                )
                .Filter("storedingCount", q => q
                    .Match(m => m
                        .Field("dataBody.status")
                        .Query("1")
                    )
                )
                .Filter("finishedCount", q => q
                    .Match(m => m
                        .Field("dataBody.status")
                        .Query("2")
                    )
                )
                .Filter("closedCount", q => q
                    .Match(m => m
                        .Field("dataBody.status")
                        .Query("3")
                    )
                )
                .Filter("thisMonthCount", q => q
                    .DateRange(dr => dr
                        .Field("dataBody.creationTime")
                        .GreaterThanOrEquals("2023-11-03")
                    )
                )
            )
        )
    )
);

In the above code, we use the Filters method to create an AggregationContainer object and add multiple filter conditions. Each filter condition is added through the Filter method, and the Match or DateRange method is used to specify the fields and query conditions.

Please note that the above example uses a class named Document to represent the document. You need to replace it according to your own data structure. At the same time, you need to replace the dataBody.status and dataBody.creationTime fields in the example according to the actual field names.

Get the value in the aggregation result

To get the value in the aggregation result queried using the above code, you can use the Aggregations property of the AggregationContainer class to access the aggregation result. The aggregate name can then be used to obtain specific aggregate results.

Here is a sample code that shows how to get the value in the aggregate result and access it based on the field name:

var groupAggregation = searchResponse.Aggregations.Filters("group");

var pendingCount = groupAggregation.Buckets["pendingCount"].DocCount;
var storingCount = groupAggregation.Buckets["storingCount"].DocCount;
var finishedCount = groupAggregation.Buckets["finishedCount"].DocCount;
var closedCount = groupAggregation.Buckets["closedCount"].DocCount;
var thisMonthCount = groupAggregation.Buckets["thisMonthCount"].DocCount;

// or as follows
var pendingCount = ((SingleBucketAggregate)groupAggregation["pendingCount"]).DocCount;
var storingCount = ((SingleBucketAggregate)groupAggregation["storedingCount"]).DocCount;
var finishedCount = ((SingleBucketAggregate)groupAggregation["finishedCount"]).DocCount;
var closedCount = ((SingleBucketAggregate)groupAggregation["closedCount"]).DocCount;
var thisMonthCount = ((SingleBucketAggregate)groupAggregation["thisMonthCount"]).DocCount;

In the above code, we first use the Aggregations attribute to get the aggregation results, and then use the Filters method to get the aggregate named “group”. Next, we can get the results of each filter condition by accessing the Buckets property of the aggregated result. Using the aggregation name as the key, you can directly access the DocCount property of the aggregation result to get the number of matching documents.

Note that the type of aggregate results may be different depending on the type of aggregation you use in the query. The above example assumes that the aggregation result is a FiltersAggregation type. You may need to perform appropriate type conversions based on the actual aggregation type.