I have a cluster with geographical data and it used to be in version 5.3.2. We had all kinds of issues with GC and the Java transport client, so we updated to 5.6.10.
After the update, we've seen many improvements but then found a problem with some of our queries that used to work before.
The query looks something like this:
//I don't need scoring...
// a small range of minutes
// some terms to filter by
// some extra terms filter
// array of coordinates
I know the query is not built perfectly but this is what the query builder generated. In any case, we tried playing with its outline but got no change.
The query takes up to 70-80 seconds to run, which is far above the timeout (regardless of whether it runs from the application or straight from an HTTP client).
We thought the issue was caused by the update so we created a cluster in 5.3.2 to run the query and it gave us the same results...
So, we turned to see if there are any changes in the data we missed and nothing popped out.
We then turned to the query itself but couldn't find anything. We used the search profiler to check the query and "surprisingly" the issue is with the geo_shape query. It takes 99.8 percent of the total running time.
Its worth mentioning that we do have geo queries that work just fine but only on smaller polygons.
So... I'd like to hear if anyone encountered something similar or can help with a lead...
Also, we see that the time is almost unchanged by the number of filters we had (in an attempt to lower the size of the dataset). Does the geo_shape query affected by the filters (for ex I know wildcards queries run independently)?
Some extra info:
- The query is performed on one index with an avg of 1TB of data. The index has 40 shards and 2
replicas (with sufficient cores and ram to handle everything),Other queries run just fine with a search time of under 2s.
- We use geohash with tree level of 7.
Again it did work in the past with the same configurations, mapping, data and queries and we cannot pinpoint a change to any of the variables unless there is something we missed...
Any lead will help