Ways to get all points in shapes / shapes in shapes with Geo query geo_shape (multiple pre-indexed shape) on big geo-spatial data in ElasticSearch?

I have three ElasticSearch indices with geo-spatial data:

  • index-geodoc-geoshape-shapes-1: geographical shapes (e.g.: UK Output Areas with ~200k shapes as documents)

    • each document contains a output area shape and has the geojson stored as a geometry property with type geo_shape
  • index-geodoc-geoshape-shapes-2: geographical shapes (e.g.: UK Statistical geography levels with ~500k shapes as documents)

    • each document contains a statistical geography shape and has the geojson stored as a geometry property with type geo_shape
  • index-geodoc-geoshape-points-3: Points (~500k - 1m - 3m documents)

    • each document contains a point with some properties and has the coordinate (lat & long) stored as a location property stored with type geo_shape (i also tried storing with type geo_point)

What are the best / optimum ways to get all the points in shapes or all the shapes in shapes from all the above 3 indices with Geo query (geo_shape) taking into account big geo-spatial data (~500k - 3m documents in each index)?

Short words: I'm doing an initial search to get all shapes Ids (SearchAsync or ScrollAll).

Then, for each shape id, I'm doing a SearchAsync call with geo_shape query with pre-indexed Shape to search the points or shapes within / that intersects.

Is there a way to make a search query with geo_shape query using multiple pre-indexed shapes from multiple documents?

I saw this will require some processing time and power, and won't finish very quickly (real-time-ish).

So, I thought of running some sort of matchmaking process as a background job to search through all the points in shapes (or shapes in shapes) and add those relations in the same index or into an additional index in order to get those points in shapes quicker than using again geo_shape query.

Still, these running background process, still take a lot of time, and I'm getting some ElasticSearch or HTTP timeouts.

At this public github gist, I provided some parts of my implementation (is not an working example, because I deleted some parts for claritiy and simplicity).


C# ASP .NET 5.0 Console Application ran from Azure VM on 2 tiers:

  • B2ms with 2vCPUs and 8GB RAM

  • D8s_v3 with 8vCPUs and 32GB RAM

ElasticSearch server cluster AZURE.DATA.HIGHCPU.D64SV3 tests made on 2 tiers:

  • 1 node with 30GB RAM and 240GB Storage

  • 2 nodes with 60GB RAM and 480GB Storage each

ElasticSearch indices are indexed into 10 Primary shards and 1 replica shard.

I notice i don't have the option to edit anymore?!
I'll add here additional note:

Short words: I'm doing an initial search to get all shapes Ids (SearchAsync with Scroll or ScrollAll).
Then, for each Shape Id, I'm doing a SearchAsync (with Scroll or with SearchAfter) call with geo_shape query with pre-indexed Shape to search the points or shapes within / that intersects.
This implies for each Shape Id I'm sending to ES server 300k-500k-1m (or more) search requests. What do you think? What would be a better way?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.