Elastic search 7.0.0 geo_shape performance

Hi,
We have now migrated to ES 7.0.0 from 6.4.0 and we removed the tree and precision values from geo_shape field:
"area": {"type": "geo_shape", "tree": "quadtree", "precision": "2km"} =>
"area": {"type": "geo_shape"}
It says it's deprecated and will disappear in the future.
We noticed that the indexing speed improved a lot but the query speed is slower now.
We tested the same index, same data, just removing the tree and precision from geo_shape.
The numbers are:

The index has 8.511.044 of documents
The query took before 20 ms and now it takes 560 ms
The indexation process took before 2 hours and now takes 3 minutes
The weight on disk before was 21Gb and now is 3.6Gb

We assume that with the tree value does some logic on indexation that makes the query faster.
I guess there is no magic and can't have the benefits of two options, or slower query or slower indexation.
What will happen with query performance when this option disappears in the future?

Thanks

Hi,

To have more context about your performance results, could you share what kind of query are you executing e.g (Query by polygon, bounding box, etc...) ?

Thanks!

Hi Ignacio,

The geo_shapes I'm indexing are envelopes: https://www.elastic.co/guide/en/elasticsearch/reference/current/geo-shape.html#_envelope
The query I'm executing:
GET brochure_index_staging/_search
{"from":0,"size":6,
"query":{
"bool":{
"must":[{
"nested":{
"query":{
"geo_shape":{
"stores.area":{
"shape":{
"type":"point",
"coordinates":[-3.700345516204834,40.416690826416016]},
"relation":"intersects"},
"ignore_unmapped":false,
"boost":1.0

      }},
      "path":"stores",
      "ignore_unmapped":false,
      "score_mode":"none",
      "boost":1.0}
  
}],"adjust_pure_negative":true,
"boost":1.0}},
"_source":{"includes":["id","name","folder_name","publication","expiration","retailer.name","images"],
"excludes":["stores","highlights","country"]}}

Same query on both cases have these different results I posted before, 20ms with quadtree field especification and 560ms without it.

Thanks Joel,

I would expect that query to be much faster on the new indexing strategy. I will have a look and report back.

Hi Joel,

I just indexed around 12 million envelopes in ES 7.0 and queries by point are very fast. The difference with your example is that I am not using nested fields. Questions:

  1. How many envelopes you have per document in average?
  2. Are those envelopes in general disjoint or most of the times overlap?

If you have time, it would be nice if you can reproduce the slowness without nested fields.

Hi Ignacio,

1.- I have a total of 8.511.044 documents, counting nested docs.
From these 8.511.044, 8012506 are nested documents with the geo_shape field.
2.- Cannot tell exactly, I think in most cases will overlap with a lot of other nested objects from another root object because there is a lot of duplicated data for filterting.

I will try what you say to test it without nested fields.

Thanks

I have tried the same exercise with nested fields, just one envelope per nested field. Still the queries are running very fast (~20ms).

I am using just one shard in my index and no replicas.

Let me know if you see some improvement when not using nested fields.

I was able to slow down the query to around 600ms. The way I did it was by indexing lots of documents with overlapping envelopes. Then my point query is hitting ~3million documents and that is lowing down the query.

How many documents is your query matching?

No so many docs as you
I'm hitting 36364
"hits" : {
"total" : {
"value" : 36364,
"relation" : "eq"
},
I'm now indexing the data without the nested when it's done I'll post the results of the query
Thanks a lot

I tested the query without nested, and the result is kind of the same, maybe a little bit slower, but because data is not exactly like before. I will try to be more accurate on query and filter more to have less hits and see what happens.

Also having only 1 hit it takes the same time.

We have noticed that there can be a degradation of performance when there are duplicated shapes on the index. This will be addressed in the next releases.

Ok.

Thank you very much

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.