I've definitely encountered this issue as well and are able to recreate it with test data. The following comparisons were performed against v2.1.2, v2.2.0 and 5.0.0 (built from source).
The test data is 6 million documents with random locations on the following index (single shard, query cache disabled). I'm running Elasticsearch with the stock configs on my development machine.
{
"test": {
"mappings": {
"document": {
"properties": {
"id": {
"type": "integer"
},
"location": {
"type": "geo_point",
"lat_lon": true
}
}
}
}
}
}
The geo_distance
queries look like:
{
"query": {
"bool": {
"filter": {
"geo_distance": {
"distance": "100mi",
"location": "[lat], [lon]"
}
}
}
}
}
Here are the average times when performing the same 100 geo_distance
searches against the different Elasticsearch versions:
Radius 2.1.2 2.2.0 5.0.0
1mi 256ms 1ms 2ms
10mi 206ms 10ms 2ms
30mi 236ms 36ms 3ms
50mi 252ms 277ms 40ms
100mi 258ms 461ms 74ms
250mi 241ms 2,088ms 315ms
500mi 227ms 3,534ms 643ms
1000mi 223ms 943ms 339ms
2000mi 280ms 754ms 683ms
I was also curious about bounding box performance, so I ran the same queries using a bounding box that circumbscribes the diameter of the geo_distance
query:
2.1.2 2.2.0 5.0.0
1mi 211ms 23ms 9ms
10mi 217ms 72ms 13ms
30mi 221ms 60ms 18ms
50mi 221ms 91ms 23ms
100mi 220ms 89ms 32ms
250mi 223ms 130ms 54ms
500mi 231ms 168ms 81ms
1000mi 231ms 156ms 96ms
2000mi 226ms 143ms 100ms
The slower queries for 2.2.0 and 5.0.0 always have the largest matching count of documents, so queries over areas with lower matching documents perform quickly.
I'm not too sure what to think about the v2.2.0 performance at 250mi and 500mi, but it's definitely not an option for us to run those searches. The next version looks to be an improvement over 2.2.0 but still lags behind 2.1.2 with regards to larger radius searches.