Hi folks,
We've recently tried to upgrade to 0.90.5 and have noticed a huge drop in
geo_shape query performance. As well as poor query latency, the volume of
resources required to answer simple geospatial operations has a huge
knock-on impact to other query types.
For our examples, we're executing a point lookup against a set of <100
documents representing geometries in the US. The documents exist in an
index with many other docs (~100mn), but are defined as a specific type in
the index, with an appropriate mapping and tree level. For these examples,
all metrics were taken from executing 50 queries per sample, ~4
concurrently. I ran two clusters (one for 0.20.4 and 0.90.5), both clusters
have the same number of nodes, and the 0.90.5 cluster has about 2/3 the
number of docs. Latency is measured using the "took" metric in the ES JSON
response.
Basic perf for searching for a point in a doc (query sample below):
version 50-percentile (ms) 90-percentile (ms) sample 1 0.20.4 98.5
145.9sample 20.20.490.5183.2  sample 10.90.53685.56211.9sample
20.90.535885366.3
Aside from the slower performance here, the impact on other ongoing queries
is significant. Whilst these queries are running, this is the latency we
see for very simple term lookups:
version 50-percentile (ms) 90-percentile (ms) term query  0.20.4 27
65term with ongoing point queries0.20.434116.5  term
query0.90.552.5105.8term with ongoing point queries
0.90.5 1745.5 3811.8
During testing, I also tried a few things on a single box. Here are some
things I observed:
- For point queries, CPU is entirely pegged to achieve results.
- On a bigger box with twice the CPUs, latency dropped to about 60% of
 what we see here.
- No obvious memory constraints.
- When I executed 10 point queries, 1456mb was loaded into disk cache,
 compared to 8mb when a separate small index that had only the state
 documents.
- I ran some tests loading to a single shard, there seems to be a point
 where performance dropped a lot. Specifically:- between 250k - 400k, where latency would drop 10x (see included
 graph at end of message)
- before this point, perf was actually quite reasonable on the small
 index
 
- between 250k - 400k, where latency would drop 10x (see included
Notes on the configuration:
- 7 nodes, m1.large
- 1 replica
- 1 index, ~100mn docs
- geo mapping: { "type" : "geo_shape", "tree" : "quadtree",
 "tree_levels" : 9, "distance_error_pct" : 0.0 }
- query sample: {"constant_score": {"boost": 1, "filter": {"geo_shape":
 {"geometry": {"shape": {"type": "Point", "coordinates": [, ]}
 "relation": "intersects"}}}}}
I'm extremely confused about why we're seeing this performance difference,
especially after a version upgrade and a reduction in index size, it's
blocking our migration. We noticed none of these issue with our previous
cluster, and have completed an ingest of the exact same mappings &
documents - except less of them - to the new cluster.
I would be very interested to hear about any solutions to or reasons for
this problem, and am more than happy to investigate further angles if
people have suggestions.
Cheers,
Oli
[image: Inline image 1]
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.