GeoShape: Consuming more heap on data node

pokaleshrey · October 10, 2019, 10:23am

Version 7.1.1

My mapping setting consists of following:

"geometry": {
"type": "geo_shape",
"tree": "quadtree",
"precision": "8m"
}

When i check /_segments?verbose=true, i can see my maximum memory_in_bytes (memory occupied on heap) is occupied by geometry field. All other text fields occupy very less heap.

I understand that geo_shape comes at cost of more memory and disk space.

Query:
Do we have a way wherein we can store this in a compressed format and still be able to query ?

Ignacio_Vera · October 10, 2019, 10:38am

Hi,

Have you tried the new indexing strategy introduced in ES 6.6.0? It still have some limitations but the cost of memory and disk space should be much lower.

pokaleshrey · October 10, 2019, 1:05pm

Hi @Ignacio_Vera,
Thanks for your reply.
I have seen this strategy but have not tried it yet.

The reason being, we had current strategy since 6.4.1 and gave us accurate results.

Can you give an insight about how the 6.6.1 strategy differs from the one we have used, in terms of accuracy ?

Ignacio_Vera · October 11, 2019, 12:16pm

The recursive strategy is based on describing the shape using the grid provided (in your case a quad tree). That means the logic computes all the cells that intersects with the indexed shapes at the given precision and stores that information in the inverted index.

Every cell is described as a prefix path and that goes into the terms dictionary. The higher the precision the more cells you need to describe your shape and the longer those paths will be. This dictionary is loaded into heap so that is the reason you see high heap usage for that field. Unfortunately the only ways to decrease heap usage would be to index your shapes at a lower precision, either using precision or distance_err_pct parameters.

The new indexed strategy is based on Lucene's BKD tree. Shapes are vectorised using triangles and stored in the tree as a bounding box plus some extra information that helps reconstructing the original triangle. The precision of the shapes is only limited to the encoding used for storing those vectors (1e-7 decimal degree precision).

The result is much faster indexing throughput, smaller index, smaller heap footprint and in most cases faster query throughput. And there is no need to set any extra parameter in order to get your data loaded into ES :).

pokaleshrey · October 14, 2019, 4:09am

You are awesome. Thanks

system · November 11, 2019, 4:10am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Reducing geoshape tree_levels doubles query time (1.7) Elasticsearch	4	556	July 5, 2017
6.4 to 7.3, After Upgrade Elasticsearch uses 50% more Disk storage Elasticsearch	4	517	March 13, 2020
Elastic search 7.0.0 geo_shape performance Elasticsearch	13	1279	July 17, 2019
Geo_shape query accuracy Elasticsearch	14	928	December 16, 2019
After Elasticsearch 7.10->7.16 Upgrade Geo Shape Queries Cause Heap Problem ( G1 Humongous Allocation ) Elasticsearch	7	981	February 1, 2022

GeoShape: Consuming more heap on data node

Related topics