Elastic search 7.0.0 geo_shape performance

joel.ortin · June 7, 2019, 8:48am

Hi,
We have now migrated to ES 7.0.0 from 6.4.0 and we removed the tree and precision values from geo_shape field:
"area": {"type": "geo_shape", "tree": "quadtree", "precision": "2km"} =>
"area": {"type": "geo_shape"}
It says it's deprecated and will disappear in the future.
We noticed that the indexing speed improved a lot but the query speed is slower now.
We tested the same index, same data, just removing the tree and precision from geo_shape.
The numbers are:

The index has 8.511.044 of documents
The query took before 20 ms and now it takes 560 ms
The indexation process took before 2 hours and now takes 3 minutes
The weight on disk before was 21Gb and now is 3.6Gb

We assume that with the tree value does some logic on indexation that makes the query faster.
I guess there is no magic and can't have the benefits of two options, or slower query or slower indexation.
What will happen with query performance when this option disappears in the future?

Thanks

Ignacio_Vera · June 10, 2019, 11:49am

Hi,

To have more context about your performance results, could you share what kind of query are you executing e.g (Query by polygon, bounding box, etc...) ?

Thanks!

joel.ortin · June 11, 2019, 7:32am

Hi Ignacio,

The geo_shapes I'm indexing are envelopes: https://www.elastic.co/guide/en/elasticsearch/reference/current/geo-shape.html#_envelope
The query I'm executing:
GET brochure_index_staging/_search
{"from":0,"size":6,
"query":{
"bool":{
"must":[{
"nested":{
"query":{
"geo_shape":{
"stores.area":{
"shape":{
"type":"point",
"coordinates":[-3.700345516204834,40.416690826416016]},
"relation":"intersects"},
"ignore_unmapped":false,
"boost":1.0

      }},
      "path":"stores",
      "ignore_unmapped":false,
      "score_mode":"none",
      "boost":1.0}
  
}],"adjust_pure_negative":true,
"boost":1.0}},
"_source":{"includes":["id","name","folder_name","publication","expiration","retailer.name","images"],
"excludes":["stores","highlights","country"]}}

Same query on both cases have these different results I posted before, 20ms with quadtree field especification and 560ms without it.

Ignacio_Vera · June 11, 2019, 9:20am

Thanks Joel,

I would expect that query to be much faster on the new indexing strategy. I will have a look and report back.

Ignacio_Vera · June 11, 2019, 11:01am

Hi Joel,

I just indexed around 12 million envelopes in ES 7.0 and queries by point are very fast. The difference with your example is that I am not using nested fields. Questions:

How many envelopes you have per document in average?
Are those envelopes in general disjoint or most of the times overlap?

If you have time, it would be nice if you can reproduce the slowness without nested fields.

joel.ortin · June 11, 2019, 11:34am

Hi Ignacio,

1.- I have a total of 8.511.044 documents, counting nested docs.
From these 8.511.044, 8012506 are nested documents with the geo_shape field.
2.- Cannot tell exactly, I think in most cases will overlap with a lot of other nested objects from another root object because there is a lot of duplicated data for filterting.

I will try what you say to test it without nested fields.

Thanks

Ignacio_Vera · June 11, 2019, 11:40am

I have tried the same exercise with nested fields, just one envelope per nested field. Still the queries are running very fast (~20ms).

I am using just one shard in my index and no replicas.

Let me know if you see some improvement when not using nested fields.

Ignacio_Vera · June 11, 2019, 1:25pm

I was able to slow down the query to around 600ms. The way I did it was by indexing lots of documents with overlapping envelopes. Then my point query is hitting ~3million documents and that is lowing down the query.

How many documents is your query matching?

joel.ortin · June 11, 2019, 2:20pm

No so many docs as you
I'm hitting 36364
"hits" : {
"total" : {
"value" : 36364,
"relation" : "eq"
},
I'm now indexing the data without the nested when it's done I'll post the results of the query
Thanks a lot

joel.ortin · June 11, 2019, 3:09pm

I tested the query without nested, and the result is kind of the same, maybe a little bit slower, but because data is not exactly like before. I will try to be more accurate on query and filter more to have less hits and see what happens.

joel.ortin · June 11, 2019, 3:43pm

Also having only 1 hit it takes the same time.

Ignacio_Vera · June 19, 2019, 8:27am

We have noticed that there can be a degradation of performance when there are duplicated shapes on the index. This will be addressed in the next releases.

joel.ortin · June 19, 2019, 1:45pm

Ok.

Thank you very much

system · July 17, 2019, 1:45pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Very slow indexing with geo-shape Elasticsearch	1	835	January 5, 2017
Reducing geoshape tree_levels doubles query time (1.7) Elasticsearch	4	556	July 5, 2017
Geo_shape indexing speed Elasticsearch	8	352	July 6, 2017
Geo_shape indexing default precision? Elasticsearch	1	409	March 24, 2018
Slow geo_shape filter Elasticsearch	2	587	July 6, 2017

Elastic search 7.0.0 geo_shape performance

Related topics