ElasticSearch aggregation
Aggregation in Elasticsearch is a mechanism for extracting and displaying data in a structured way. Think of it as the GROUP BY statement in SQL, but more powerful and flexible.
Aggregation type
Elasticsearch supports many types of aggregations, including:
- Metrics aggregations: Mainly used for statistics of indicators such as maximum value, minimum value, average value, and sum of fields. This type of aggregation calculates based on the numeric value of a document field and returns a single numeric result. For example, maximum value (max), minimum value (min), average value (average), sum (sum), statistical information (stats, including the above operations), and other complex aggregations such as percentages (percentiles), cardinality ( cardinality) etc.
- Bucket aggregations: Analogous to the role of group by in SQL, it is mainly used to count the number of different types of data. This type of aggregation creates a set of buckets, each bucket corresponding to a specific condition or range, and then documents are classified into the corresponding buckets based on these conditions or ranges. Common ones include range, date range, histogram, date histogram, geohash grid, etc.
- Pipeline aggregations: used for secondary aggregation of aggregation results. If you want to count the buckets with the largest number of bound tags, you must first bucket them according to the tags, and then calculate the maximum value on the bucketed results. Such aggregations can perform secondary calculations based on the results of other aggregations. For example, calculate differences, ratios, moving averages, etc.
Elasticsearch’s aggregation operation supports nesting, that is, an aggregation can contain other sub-aggregations, thereby realizing very complex data mining and statistical requirements.
Aggregation query syntax
{ "size" : 0, "aggregations" : { "Self-named aggregate name 1" : { "filter" : { "bool" : { "filter" : [ { "term" : { "Query field 1" : { "value": Query value 1, "boost" : 1.0 } } } ] } }, "aggregations" : { "Self-named aggregate name 2" : { "sum" : { "field" : "query value 2" } } } } } }
Write Aggregation
Nest provides 3 ways for you to use aggregation:
- By means of lambda expressions.
- Through the built-in request object AggregationDictionary.
- Simplify the use of AggregationDictionary by combining binary operators.
Suppose we have the following Project class:
public class Project { public string Name { get; set; } public int Quantity { get; set; } }
The request commands in the three ways are shown below:
POST /project/_search?typed_keys=true { "aggs": { //Keyword aggregations, can be abbreviated with aggs "average_quantity": { //The name of the aggregate "avg": { //The type of aggregation can be understood as equivalent to the aggregation function in sql server "field": "quantity" //Aggregation, which fields are aggregated } }, "max_quantity": { "max": { "field": "quantity" } }, "min_quantity": { "min": { "field": "quantity" } } } }
lambda-method
Using aggregations via lambda expressions is a concise way
var searchResponse = _client.Search<Project>(s => s .Aggregations(aggs => aggs .Average("average_quantity", avg => avg.Field(p => p.Quantity)) .Max("max_quantity", avg => avg.Field(p => p.Quantity)) .Min("min_quantity", avg => avg.Field(p => p.Quantity)) ) );
Generally, when performing aggregation queries, _source is not needed, so when performing aggregation queries, you can specify size=0 in the query statement, so that only aggregated results will be returned, as follows:
var searchResponse = _client.Search<Project>(s => s .Size(0) //Explicitly specified as 0 .Aggregations(aggs => aggs .Average("average_quantity", avg => avg.Field(p => p.Quantity)) .Max("max_quantity", avg => avg.Field(p => p.Quantity)) .Min("min_quantity", avg => avg.Field(p => p.Quantity)) ) );
By built-in object AggregationDictionary
The following code has the same effect as passing a lambda expression
var searchRequest = new SearchRequest<Project> { Size = 0, Aggregations = new AggregationDictionary { {"average_quantity", new AverageAggregation("average_quantity", "quantity")}, {"max_quantity", new MaxAggregation("max_quantity", "quantity")}, {"min_quantity", new MinAggregation("min_quantity", "quantity")}, } }; var searchResponse = _client.Search<Project>(searchRequest);
Simplify the use of AggregationDictionary by combining binary operators
Binary operators can make the code more readable. The following code is equivalent to the above:
var searchRequest = new SearchRequest<Project> { Size = 0, Aggregations = new AverageAggregation("average_quantity", "quantity") & amp; & amp;new MaxAggregation("max_quantity", "quantity") & amp; & amp;new MinAggregation("min_quantity", "quantity") }; var searchResponse = _client.Search<Project>(searchRequest);
Practice of querying aggregate values
Multiple aggregation condition queries
When using NEST for aggregation queries in C#, you can use the AggregationContainer
class to add multiple aggregation conditions.
First, you need to create an AggregationContainer
object and add the aggregation conditions you need. For example, if you want to query the total sales volume and average price of items in an index, you can write the code like this:
var searchResponse = client.Search<Product>(s => s .Aggregations(a => a .Sum("total_sales", st => st .Field(f => f.Sales) ) .Average("average_price", ap => ap .Field(f => f.Price) ) ) );
In the above code, we create an AggregationContainer
object and use the Sum
and Average
methods to add two aggregation conditions. The Sum
method is used to calculate the total sales volume, and the Average
method is used to calculate the average price. At the same time, we also specified the corresponding fields, Sales
and Price
.
Finally, we use the client.Search
method to execute the query, and the query results will include aggregated results.
Please note that the above example uses a class named Product
to represent the product. You need to replace it according to your own data structure. Also, you need to replace the Sales
and Price
fields in the example with their actual field names.
Aggregation query of multiple filter conditions
To use NEST to write C# code to generate the above Elasticsearch aggregation query statement, you can use the Filters
method of the AggregationContainer
class to add multiple filter conditions.
{"aggregations": { "group": { "filters": { "filters": { "pendingCount": { "match": { "dataBody.status": 0 } }, "storedingCount": { "match": { "dataBody.status": 1 } }, "finishedCount": { "match": { "dataBody.status": 2 } }, "closedCount": { "match": { "dataBody.status": 3 } }, "thisMonthCount": { "range": { "dataBody.creationTime": { "gte": "2023-11-03" } } } } } } }
The following is a sample code showing how to use NEST to generate the above aggregation query statement:
var searchResponse = client.Search<Document>(s => s .Aggregations(a => a .Filters("group", f => f .NamedFilters(flt => flt .Filter("pendingCount", q => q .Match(m => m .Field("dataBody.status") .Query("0") ) ) .Filter("storedingCount", q => q .Match(m => m .Field("dataBody.status") .Query("1") ) ) .Filter("finishedCount", q => q .Match(m => m .Field("dataBody.status") .Query("2") ) ) .Filter("closedCount", q => q .Match(m => m .Field("dataBody.status") .Query("3") ) ) .Filter("thisMonthCount", q => q .DateRange(dr => dr .Field("dataBody.creationTime") .GreaterThanOrEquals("2023-11-03") ) ) ) ) ) );
In the above code, we use the Filters
method to create an AggregationContainer
object and add multiple filter conditions. Each filter condition is added through the Filter
method, and the Match
or DateRange
method is used to specify the fields and query conditions.
Please note that the above example uses a class named Document
to represent the document. You need to replace it according to your own data structure. At the same time, you need to replace the dataBody.status
and dataBody.creationTime
fields in the example according to the actual field names.
Get the value in the aggregation result
To get the value in the aggregation result queried using the above code, you can use the Aggregations
property of the AggregationContainer
class to access the aggregation result. The aggregate name can then be used to obtain specific aggregate results.
Here is a sample code that shows how to get the value in the aggregate result and access it based on the field name:
var groupAggregation = searchResponse.Aggregations.Filters("group"); var pendingCount = groupAggregation.Buckets["pendingCount"].DocCount; var storingCount = groupAggregation.Buckets["storingCount"].DocCount; var finishedCount = groupAggregation.Buckets["finishedCount"].DocCount; var closedCount = groupAggregation.Buckets["closedCount"].DocCount; var thisMonthCount = groupAggregation.Buckets["thisMonthCount"].DocCount; // or as follows var pendingCount = ((SingleBucketAggregate)groupAggregation["pendingCount"]).DocCount; var storingCount = ((SingleBucketAggregate)groupAggregation["storedingCount"]).DocCount; var finishedCount = ((SingleBucketAggregate)groupAggregation["finishedCount"]).DocCount; var closedCount = ((SingleBucketAggregate)groupAggregation["closedCount"]).DocCount; var thisMonthCount = ((SingleBucketAggregate)groupAggregation["thisMonthCount"]).DocCount;
In the above code, we first use the Aggregations
attribute to get the aggregation results, and then use the Filters
method to get the aggregate named “group”. Next, we can get the results of each filter condition by accessing the Buckets
property of the aggregated result. Using the aggregation name as the key, you can directly access the DocCount
property of the aggregation result to get the number of matching documents.
Note that the type of aggregate results may be different depending on the type of aggregation you use in the query. The above example assumes that the aggregation result is a FiltersAggregation
type. You may need to perform appropriate type conversions based on the actual aggregation type.