Elasticsearch: Geo Point and Geo Shape Query Explanation


In this article, we’ll take a look at Elasticsearch’s geoquery, how to set up mappings and indexes, and provide some examples of how to query data.

Geographic data in Elasticsearch

Elasticsearch allows you to represent GeoData in two ways: geo_shape and geo_point.

Geo Point allows you to store data as latitude and longitude coordinate pairs. Use this field type when you want to filter data for distance between points, search within a bounding box, or use aggregation. You can specify many features and options that are beyond the scope of this article. We’ll cover a few here, but you can check out the options for geographic bounding boxes, geographic distances, and geographic aggregation in Elasticsearch’s documentation.

Use Geo-Shape when you have GeoData representing a shape, or when you want to query for points within a shape. geo_shape data must be encoded in GeoJSON format, which is converted to a string representing long/latitude coordinate pairs on a Geohash cell grid. Since Elasticsearch indexes shapes as terms, it is easy to determine relationships between shapes that can be queried using intersect, disjoint, contain, or in query spatial relational operators. For more on geohash, see “Elasticsearch: Understanding geohash and aggregation in Elastic Maps”.

Unfortunately geo-point and geo-shape cannot be queried together. For example, if you want to get all the cities within a specified polygon, you cannot use a city indexed by a geopoint. They must be indexed with “type”: “point” in GeoJSON and indexed as a geo-shape.

Geo Point field type

Fields of type geo_point accept longitude-latitude pairs, which can be used:

  • Find geographic points within a bounding box, a certain distance from a center point, within a polygon, or within a geo_shape query.
  • Aggregate documents geographically or by distance from a central point.
  • Integrate distance into a document’s relevance score.
  • Sort documents by distance.

Geo point mapping

We can define an index with the geo_point data type as follows:

1. PUT location_index
2. {<!-- -->
3. "mappings": {<!-- -->
4. "properties": {<!-- -->
5. "text" : {<!-- -->
6. "type" : "text"
7. },
8. "location": {<!-- -->
9. "type": "geo_point"
10. }
11. }
12. }
13. }

We can store Geo Points in five different ways.

Geo point as an object

Objects can be used with attributes such as lat and lon.

1. PUT location_index/_doc/1
2. {<!-- -->
3. "text": "Geopoint as an object",
4. "location": {<!-- -->
5. "lat": 41.12,
6. "lon": -71.34
7. }
8.  }

Geo Point as a string

A plain string that can be separated by “,” in the format lat, lon.

1. PUT location_index/_doc/2
2. {<!-- -->
3. "text": "Geopoint as a string",
4. "location": "41.12,-71.34"
5. }

Geo Point as Geohash

Hash values are used to represent lat and lon. There is an online site to do this:

1. PUT location_index/_doc/3
2. {<!-- -->
3. "text": "Geopoint as a geohash",
4. "location": "drm3btev3e86"
5. }

Geo point as an array

Coordinates can be expressed in the form of an array [lon, lat] with double values.

1. PUT location_index/_doc/4
2. {<!-- -->
3. "text": "Geopoint as an array",
4. "location": [ -71.34, 41.12 ]
5. }

Geo Point as WKT Point

Coordinates can be expressed in the form of the function POINT(lon lat).

1. PUT location_index/_doc/5
2. {<!-- -->
3. "text": "Geopoint as a WKT POINT primitive",
4. "location" : "POINT (-71.34 41.12)"
5. }

Note: Regardless of the format in which the geopoints are saved, we can also query other formats. But be careful to define the format correctly. Do not substitute for lat and lon values. This can give values that are not predetermined.

Geo shape field type

The geo_shape data type facilitates indexing and searching on arbitrary geographic shapes such as rectangles and polygons. It should be used when the data being indexed or the query being performed contains shapes rather than just points.

You can query documents using this type using geo_shape queries.

Geo shape mapping

1. PUT geo_shape_indx
2. {<!-- -->
3. "mappings": {<!-- -->
4. "properties": {<!-- -->
5. "location": {<!-- -->
6. "type": "geo_shape"
7. }
8.      }
9.    }
10. }

We have several ways to store geo_shape data as follows.

Geo Json type POINT

A single geographic coordinate. Note: Elasticsearch only uses WGS-84 coordinates.

1. POST geo_shape_indx/_doc/1
2. {<!-- -->
3. "location" : {<!-- -->
4. "type" : "point",
5. "coordinates" : [-77.03653, 38.897676]
6. }
7. }

Geo Json type LINESTRING

An arbitrary line given two or more points.

1. POST geo_shape_indx/_doc/2
2. {<!-- -->
3. "location" : {<!-- -->
4. "type" : "linestring",
5. "coordinates" : [[-77.03653, 38.897676], [-77.009051, 38.889939]]
6. }
7. }

Geo Json type POLYGON

A closed polygon whose start and end points must match, thus requires n + 1 vertices to create an n-sided polygon, and requires a minimum of 4 vertices.

1. POST geo_shape_indx/_doc/3
2. {<!-- -->
3. "location" : {<!-- -->
4. "type" : "polygon",
5. "coordinates" : [
6. [ [-77.03653, 38.897676], [-77.03653, 37.897676], [-76.03653, 38.897676], [-77.03653, 38.997676], [-77.03653, 38.897676] ]
7. ]
8.    }
9.  }

Geo Json type MULTIPOLYGON

A set of individual polygons:

1. POST geo_shape_indx/_doc/4
2. {<!-- -->
3. "location" : {<!-- -->
4. "type" : "MultiPolygon",
5. "coordinates" : [
6. [ [[102.0, 2.0], [103.0, 2.0], [103.0, 3.0], [102.0, 3.0], [102.0, 2.0]] ],
7. [ [[100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0], [100.0, 0.0]],
8. [[100.2, 0.2], [100.8, 0.2], [100.8, 0.8], [100.2, 0.8], [100.2, 0.2]] ]
9.      ]
10. }
11. }

Geo Json type MULTIPOINT

A set of unconnected but possibly related points:

1. POST geo_shape_indx/_doc/5
2. {<!-- -->
3. "location" : {<!-- -->
4. "type" : "multipoint",
5. "coordinates" : [
6. [-78.0, 38.0], [-79.0, 38.0]
7. ]
8.    }
9.  }

Geo Json type MULTILINESTRING

A set of individual line strings:

1. POST geo_shape_indx/_doc/6
2. {<!-- -->
3. "location" : {<!-- -->
4. "type" : "multilinestring",
5. "coordinates" : [
6. [ [-77.03, 38.89], [-78.03, 38.89], [-78.03, 39.89], [-78.03, 39.89] ],
7. [ [-76.03, 36.89], [-77.03, 36.89], [-77.03, 37.89], [-76.03, 37.89] ],
8. [ [-76.23, 36.69], [-76.03, 36.89], [-76.23, 36.89], [-76.23, 36.09] ]
9.      ]
10. }
11. }

Geo Json type GEOMETRYCOLLECTION

GeoJSON shapes are similar to multi* shapes, except that multiple types can coexist (for example, Point and LineString).

1. POST geo_shape_indx/_doc/7
2. {<!-- -->
3. "location" : {<!-- -->
4. "type": "geometrycollection",
5. "geometries": [
6. {<!-- -->
7. "type": "point",
8. "coordinates" : [-77.03653, 38.897676]
9.        },
10. {<!-- -->
11. "type": "linestring",
12. "coordinates" : [[-77.03653, 38.897676], [-77.009051, 38.889939]]
13. }
14. ]
15. }
16. }

Geo Json type BBOX (ENVELOPE in Elastic Search)

The bounding rectangle, or envelope, specified by specifying only the upper-left and lower-right points.

1. POST geo_shape_indx/_doc/8
2. {<!-- -->
3. "location" : {<!-- -->
4. "type" : "envelope",
5. "coordinates" : [ [-77.03653, 38.897676], [-76.03653, 37.897676] ]
6. }
7. }

By using geo_point or geo_shape, Elasticsearch will automatically find the coordinates, validate them according to the required format, and index them.

Load data into Elasticsearch

Install Elasticsearch and Kibana

If you have not installed your own Elasticsearch and Kibana, please refer to my previous article “Elasticsearch: How to run Elasticsearch 8.x on Docker for local development”. We use docker-compose to install. For the convenience of testing, we will not use security configuration. You can also refer to the “How to configure Elasticsearch without security” section in the article “Elastic Stack 8.0 Installation – Securing your Elastic Stack is now easier than ever” to install it. In our test today, we will use the latest Elastic Stack 8.6.2 for testing.

The data we’ll use in this walkthrough is taken from the Washington State Department of Transportation (WSDOT) Geodata Catalog. Download the shapefiles for “City Points” and “WSDOT Regions 24k”. City Points will give us the cities of Washington, and WSDOT Regions will give us the regions specified by WSDOT. You can view the data by clicking View next to the download link before downloading. I have converted the shapefile to GeoJSON format.

I created a nodejs application to create index and load data. Follow the steps in the Github link and follow the README file to load the data. We use the following command to download the code:

git clone https://github.com/liu-xiao-guo/node_playground
1. $ pwd
2. /Users/liuxg/nodejs/node_playground/elastic-geo-spatial
3. $ ls
4. README.md cities.json counties.json package-lock.json
5. cities.js counties.js docker-compose.yaml package.json
6. $ npm install
7. npm notice Beginning October 4, 2021, all connections to the npm registry - including for package installation - must use TLS 1.2 or higher. You are currently using plaintext http to connect. Please visit the GitHub blog for more information: https:/ /github.blog/2021-08-23-npm-registry-deprecating-tls-1-0-tls-1-1/
8. npm notice Beginning October 4, 2021, all connections to the npm registry - including for package installation - must use TLS 1.2 or higher. You are currently using plaintext http to connect. Please visit the GitHub blog for more information: https:/ /github.blog/2021-08-23-npm-registry-deprecating-tls-1-0-tls-1-1/

10. added 6 packages in 2s
11. npm notice
12. npm notice New major version of npm available! 8.19.2 -> 9.6.2
13. npm notice Changelog: https://github.com/npm/cli/releases/tag/v9.6.2
14. npm notice Run npm install -g [email protected] to update!
15. npm notice

We run the following command to write the index geo_cities_point:

node cities.js

We run the following command to write the index geo_cities_shapes:

node counties.js

We use the following command to view the latest written index:

GET _cat/indices

We can view the mapping of the two indexes through the following command:

GET geo_cities_point/_mapping
1. {<!-- -->
2. "geo_cities_point": {<!-- -->
3. "mappings": {<!-- -->
4. "properties": {<!-- -->
5. "GNIS": {<!-- -->
6. "type": "integer"
7. },
8. "location": {<!-- -->
9. "type": "geo_point"
10. },
11. "name": {<!-- -->
12. "type": "text"
13. },
14. "objectId": {<!-- -->
15. "type": "integer"
16. }
17. }
18. }
19. }
20.}

The above shows that the location field is of type geo_point. We use the following command to view the mapping of geo_cities_shapes:

GET geo_cities_shapes/_mapping
1. {<!-- -->
2. "geo_cities_shapes": {<!-- -->
3. "mappings": {<!-- -->
4. "properties": {<!-- -->
5. "location": {<!-- -->
6. "type": "geo_shape"
7. },
8. "name": {<!-- -->
9. "type": "text"
10. }
11. }
12. }
13. }
14. }

The above shows that the location field is a geo_shape type.

Geo POINT query

Elasticsearch uses term queries and filters. Queries rely on “scores,” or whether and how well documents match the query. Filtering, on the other hand, is “non-scoring” and determines whether a document matches a query. According to Elasticsearch, querying and filtering have become synonymous since 2.x, since you can have both scoring and non-scoring queries. There are various performance advantages and disadvantages to using scoring or non-scoring queries, but the rule of thumb is to use scoring queries when relevance scores are important, and use non-scoring queries for everything else.

Now that we have some data in our index, it’s time to start querying. We’ll look at some basic queries available with geo_point and geo_shape.

Distance from geographic point

To get the distance between any two points, our data must be stored using the geo_point type. The documentation provides various data formats as examples. Matches geo_point and geo_shape values within a given distance of a GeoPoint. The following query lists locations within a distance of 10 miles.

1. GET geo_cities_point/_search
2. {<!-- -->
3. "query": {<!-- -->
4. "bool": {<!-- -->
5. "must": {<!-- -->
6. "match_all": {<!-- -->}
7. },
8. "filter": {<!-- -->
9. "geo_distance": {<!-- -->
10. "distance": "10mi",
11. "location": [
12. -122.3375,
13. 47.6112
14. ]
15. }
16. }
17. }
18. }
19. }

Geo Distance Aggregation
A multi-bucket aggregation for geo_point fields, very similar in concept to range aggregation. Users can define an origin and a set of distance range buckets. The aggregation evaluates the distance of each document value from the origin and determines which bucket it belongs to based on the range (a document belongs to a bucket if its distance from the origin is within the distance range of the bucket).

Sometimes we need to know the number of coordinates in a range. This is an aggregate function for listing results. Now we will try to find the coordinates up to 10 MI , from 10 MI to 50 MI , from 50 MI to 100 MI and from 100 MI . This should return the number of matching documents in the range.

1. GET geo_cities_point/_search?size=0 & filter_path=aggregations
2. {<!-- -->
3. "aggs": {<!-- -->
4. "data_around_city": {<!-- -->
5. "geo_distance": {<!-- -->
6. "unit": "mi",
7. "field": "location",
8. "origin": "47.6112, -122.3375",
9. "ranges": [
10. {<!-- -->
11. "to": 10
12. },
13. {<!-- -->
14. "from": 10,
15. "to": 50
16. },
17. {<!-- -->
18. "from": 50,
19. "to": 100
20. },
21. {<!-- -->
22. "from": 100
twenty three.            }
twenty four.          ]
25. }
26. }
27. }
28. }

The return value of the above command is:

1. {<!-- -->
2. "aggregations": {<!-- -->
3. "data_around_city": {<!-- -->
4. "buckets": [
5. {<!-- -->
6. "key": "*-10.0",
7. "from": 0,
8. "to": 10,
9. "doc_count": 12
10. },
11. {<!-- -->
12. "key": "10.0-50.0",
13. "from": 10,
14. "to": 50,
15. "doc_count": 77
16. },
17. {<!-- -->
18. "key": "50.0-100.0",
19. "from": 50,
20. "to": 100,
21. "doc_count": 57
twenty two.          },
23. {<!-- -->
24. "key": "100.0-*",
25. "from": 100,
26. "doc_count": 135
27. }
28.]
29.}
30.}
31. }

Geographic points in geographic polygons

We could also have a query that returns only hits that fall within the polygon points:

1. GET geo_cities_point/_search?filter_path=**.hits
2. {<!-- -->
3. "_source": false,
4. "fields": [
5. "objectId",
6. "name"
7. ],
8. "query": {<!-- -->
9. "bool": {<!-- -->
10. "must": {<!-- -->
11. "match_all": {<!-- -->}
12. },
13. "filter": {<!-- -->
14. "geo_shape": {<!-- -->
15. "location": {<!-- -->
16. "shape": {<!-- -->
17. "type": "polygon",
18. "relation": "within",
19. "coordinates": [
20. [
twenty one.                    [
22. -122.35610961914062,
23. 47.70514099299205
twenty four.                    ],
25. [
26. -122.48519897460936,
27. 47.5626274374099
28. ],
29. [
30. -122.28744506835938,
31. 47.44852243794931
32. ],
33. [
34. -122.15972900390624,
35. 47.558920607496525
36. ],
37. [
38. -122.2283935546875,
39. 47.719001413201916
40. ],
41. [
42. -122.35610961914062,
43. 47.70514099299205
44.]
45.]
46.]
47.}
48.}
49.}
50.}
51.}
52.}
53.}


The above query returns results:

1. {<!-- -->
2. "hits": {<!-- -->
3. "hits": [
4. {<!-- -->
5. "_index": "geo_cities_point",
6. "_id": "83",
7. "_score": 1,
8. "fields": {<!-- -->
9. "name": [
10. "Seattle"
11. ],
12. "objectId": [
13. 83
14. ]
15. }
16. },
17. {<!-- -->
18. "_index": "geo_cities_point",
19. "_id": "97",
20. "_score": 1,
21. "fields": {<!-- -->
22. "name": [
23. "Bellevue"
twenty four.            ],
25. "objectId": [
26.97
27.]
28. }
29. },
30. {<!-- -->
31. "_index": "geo_cities_point",
32. "_id": "101",
33. "_score": 1,
34. "fields": {<!-- -->
35. "name": [
36. "Yarrow Point"
37. ],
38. "objectId": [
39. 101
40.]
41. }
42. },
43. {<!-- -->
44. "_index": "geo_cities_point",
45. "_id": "102",
46. "_score": 1,
47. "fields": {<!-- -->
48. "name": [
49. "Hunts Point"
50. ],
51. "objectId": [
52. 102
53.]
54.}
55. },
56. {<!-- -->
57. "_index": "geo_cities_point",
58. "_id": "103",
59. "_score": 1,
60. "fields": {<!-- -->
61. "name": [
62. "Medina"
63. ],
64. "objectId": [
65. 103
66.]
67.}
68. },
69. {<!-- -->
70. "_index": "geo_cities_point",
71. "_id": "104",
72. "_score": 1,
73. "fields": {<!-- -->
74. "name": [
75. "Clyde Hill"
76. ],
77. "objectId": [
78. 104
79.]
80.}
81. },
82. {<!-- -->
83. "_index": "geo_cities_point",
84. "_id": "108",
85. "_score": 1,
86. "fields": {<!-- -->
87. "name": [
88. "Mercer Island"
89. ],
90. "objectId": [
91. 108
92.]
93.}
94. },
95. {<!-- -->
96. "_index": "geo_cities_point",
97. "_id": "110",
98. "_score": 1,
99. "fields": {<!-- -->
100. "name": [
101. "Beaux Arts"
102. ],
103. "objectId": [
104. 110
105.]
106. }
107. }
108.]
109. }
110. }

Geographic points within a geographic bounding box

Match geo_point and geo_shape values that intersect the bounding box. GeoHashes are treated as rectangles when they are used to specify the bounds of the bounding box edges. The bounding box is defined such that its upper left corner corresponds to the upper left corner of the GeoHash specified in the top_left parameter, and its lower right corner is defined as the lower right corner of the GeoHash specified in the bottom_right parameter.

Geographic points (geo_point) have limited precision and are always rounded down at index time. During querying, the upper boundary of the bounding box is rounded down, and the lower boundary is rounded up. Therefore, points on the lower boundary (the bottom and left edge of the bounding box) may not fit into the bounding box due to rounding errors. At the same time, the query may select points next to the upper boundary (top and right edge), even if they are slightly outside the edge. The rounding error should be less than 4.20e-8 degrees for latitude and less than 8.39e-8 degrees for longitude, which means less than 1 centimeter error even at the equator.

1. GET geo_cities_point/_search?filter_path=**.hits
2. {<!-- -->
3. "query": {<!-- -->
4. "bool": {<!-- -->
5. "must": {<!-- -->
6. "match_all": {<!-- -->}
7. },
8. "filter": {<!-- -->
9. "geo_bounding_box": {<!-- -->
10. "location": {<!-- -->
11. "top_left": {<!-- -->
12. "lat": 47.7328,
13. "lon": -122.448
14. },
15. "bottom_right": {<!-- -->
16. "lat": 47.468,
17. "lon": -122.0924
18. }
19. }
20.}
twenty one.        }
twenty two.      }
twenty three.    }
twenty four.  }

Geo Shape query

All geo-shape queries require a geo_shape map to map your data. Using geo-shapes, we can find documents that intersect the query shape.

Geo Shape Query

Filter documents indexed with geo_shape or geo_point types. A geo_shape map or geo_point map is required.

geo_shape queries use the same grid square representation as geo_shape maps to find documents with a shape that intersects the query shape. It will also use the same prefix tree configuration defined for field mappings. Query supports two ways of defining a query shape, by providing the entire shape definition or by referencing the name of a shape pre-indexed in another index. Both formats are defined below with examples.

Spatial relationship

The geo_shape strategy map parameter determines which spatial relational operators can be used when searching. Here is a full list of spatial relational operators available when searching geographic fields:

  • INTERSECTS – (default) returns all documents whose geo_shape or geo_point fields intersect the query geometry.
  • DISJOINT – Returns all documents whose geo_shape or geo_point fields do not have in common with the query geometry.
  • WITHIN – Returns all documents whose geo_shape or geo_point fields are within the query geometry. Line geometry is not supported.
  • CONTAINS-Returns all documents whose geo_shape or geo_point fields contain the query geometry.
    for example:
1. GET geo_cities_shapes/_search
2. {<!-- -->
3. "query": {<!-- -->
4. "bool": {<!-- -->
5. "must": {<!-- -->
6. "match_all": {<!-- -->}
7. },
8. "filter": {<!-- -->
9. "geo_shape": {<!-- -->
10. "location": {<!-- -->
11. "shape": {<!-- -->
12. "type": "envelope",
13. "coordinates": [
14. [
15. -122.35610961914062,
16. 47.70514099299205
17. ],
18. [
19. -122.2283935546875,
20. 47.01900141320191
twenty one.                  ]
twenty two.                ]
twenty three.              },
24. "relation": "disjoint"
25. }
26. }
27. }
28. }
29.}
30.}

Pre-indexed shapes

The query also supports using shapes that are already indexed in another index. This is especially useful when you have a predefined list of shapes and you want to refer to that list using a logical name (such as New Zealand) rather than providing coordinates each time. In this case, just provide:

  • id – the ID of the document containing the preindexed shape.
  • index – The name of the index where the pre-indexed shape resides. Defaults to shapes.
  • path – This field is specified as the path containing the pre-indexed shape. Defaults to shap.
  • routing – the routing of the shape document (if required).
1. PUT shapes
2.  {
3. "mappings": {
4. "properties": {
5. "geometry": {
6. "type": "geo_shape"
7. }
8.      }
9.    }
10. }
1. PUT shapes/_doc/test
2. {<!-- -->
3. "location": {<!-- -->
4. "type": "envelope",
5. "coordinates": [
6. [
7. -122.35610961914062,
8. 47.70514099299205
9.        ],
10. [
11. -122.2283935546875,
12. 47.01900141320191
13.]
14. ]
15. }
16. }

We can do a search like this:

1. GET geo_cities_shapes/_search
2. {<!-- -->
3. "query": {<!-- -->
4. "bool": {<!-- -->
5. "filter": {<!-- -->
6. "geo_shape": {<!-- -->
7. "location": {<!-- -->
8. "indexed_shape": {<!-- -->
9. "index": "shapes",
10. "id": "test",
11. "path": "location"
12. }
13. }
14. }
15. }
16. }
17. }
18. }

Above, our shape information is obtained from the shapes index.