Hi,
We have now migrated to ES 7.0.0 from 6.4.0 and we removed the tree and precision values from geo_shape field:
"area": {"type": "geo_shape", "tree": "quadtree", "precision": "2km"} =>
"area": {"type": "geo_shape"}
It says it's deprecated and will disappear in the future.
We noticed that the indexing speed improved a lot but the query speed is slower now.
We tested the same index, same data, just removing the tree and precision from geo_shape.
The numbers are:
The index has 8.511.044 of documents
The query took before 20 ms and now it takes 560 ms
The indexation process took before 2 hours and now takes 3 minutes
The weight on disk before was 21Gb and now is 3.6Gb
We assume that with the tree value does some logic on indexation that makes the query faster.
I guess there is no magic and can't have the benefits of two options, or slower query or slower indexation.
What will happen with query performance when this option disappears in the future?
To have more context about your performance results, could you share what kind of query are you executing e.g (Query by polygon, bounding box, etc...) ?
I just indexed around 12 million envelopes in ES 7.0 and queries by point are very fast. The difference with your example is that I am not using nested fields. Questions:
How many envelopes you have per document in average?
Are those envelopes in general disjoint or most of the times overlap?
If you have time, it would be nice if you can reproduce the slowness without nested fields.
1.- I have a total of 8.511.044 documents, counting nested docs.
From these 8.511.044, 8012506 are nested documents with the geo_shape field.
2.- Cannot tell exactly, I think in most cases will overlap with a lot of other nested objects from another root object because there is a lot of duplicated data for filterting.
I will try what you say to test it without nested fields.
I was able to slow down the query to around 600ms. The way I did it was by indexing lots of documents with overlapping envelopes. Then my point query is hitting ~3million documents and that is lowing down the query.
No so many docs as you
I'm hitting 36364
"hits" : {
"total" : {
"value" : 36364,
"relation" : "eq"
},
I'm now indexing the data without the nested when it's done I'll post the results of the query
Thanks a lot
I tested the query without nested, and the result is kind of the same, maybe a little bit slower, but because data is not exactly like before. I will try to be more accurate on query and filter more to have less hits and see what happens.
We have noticed that there can be a degradation of performance when there are duplicated shapes on the index. This will be addressed in the next releases.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.