Hi all,
I am currently seeing a problem where performance drops dramatically (from approximately 20ms to 300ms) when I do a geo_distance search in the 250-430 mile range. Any distance greater than that or smaller than that is fine (sub-100ms), but that particular range is problematic.
I have been able to make the problem go away by using force merge to reduce the number of total segments to about 5 per shard, but periodic changes written to the index (every 30 minutes) cause the number of segments to drift back up to around 25 per shard, at which point we start seeing the moderate geo_distance searches start performing poorly again.
The problem is definitely due to the geo_distance query fragment:
- Adjusting the distance out of the problem range, or removing the fragment altogether, resolves the performance problem. Problem recurs when putting the distance back in the problem range.
- Profiling shows most of the time is spent mostly in the
GeoPointTermQueryConstantScoreWrapper
. The timing is consistently slow across all nodes (i.e., it is not a single problem node).
I have been able to reproduce this on multiple clusters (dev/test/production/local workstation).
Relevant info:
- We are running Elasticsearch 2.3.2 with 3 master / 3 query / 5 data nodes / 5 shards with 1 replica (10 shards total).
- This is also reproducible on a cluster with just 5 undifferentiated nodes (5 shards with 1 replica).
- The index contains 7M records and is written to periodically (every 30 minutes or so). About half the records will change over the course of a day, some batches larger than others.
- We do two types of updates:
- Full record upsert (several _bulk operations) + delete of older documents (using delete-by-query)
- Update date fields on the record (_bulk operations).
NOTE: at most each record is touched once every 4 hours (usually once/day)
- The geo_point field queried is declared using default settings (just
{ "type": "geo_point" }
) - The index is using default settings except for
index.max_results_window
being set to50000
. - On my local workstation I tested with 2.3.5 and with the default
index.max_results_window
(10000
), so I do not believe this setting to be the cause of the problem.
My questions are:
- Is there a way to tune Elasticsearch so queries not perform so poorly for distances in this range?
- Is there a way to make the cluster perform automatic merges more frequently or should I force merge?
- It seems to be considered ill-advised to run frequent force merges on an index. Is that still the case when doing bulk writes on a 30-minute basis?
Thank you for your assistance!
# Chris