I have three ElasticSearch indices with geo-spatial data:
-
index-geodoc-geoshape-shapes-1
: geographical shapes (e.g.: UK Output Areas with ~200k shapes as documents)- each document contains a output area shape and has the geojson stored as a geometry property with type
geo_shape
- each document contains a output area shape and has the geojson stored as a geometry property with type
-
index-geodoc-geoshape-shapes-2
: geographical shapes (e.g.: UK Statistical geography levels with ~500k shapes as documents)- each document contains a statistical geography shape and has the geojson stored as a geometry property with type
geo_shape
- each document contains a statistical geography shape and has the geojson stored as a geometry property with type
-
index-geodoc-geoshape-points-3
: Points (~500k - 1m - 3m documents)- each document contains a point with some properties and has the coordinate (lat & long) stored as a location property stored with type
geo_shape
(i also tried storing with typegeo_point
)
- each document contains a point with some properties and has the coordinate (lat & long) stored as a location property stored with type
What are the best / optimum ways to get all the points in shapes or all the shapes in shapes from all the above 3 indices with Geo query (geo_shape
) taking into account big geo-spatial data (~500k - 3m documents in each index)?
Short words: I'm doing an initial search to get all shapes Ids (SearchAsync
or ScrollAll
).
Then, for each shape id, I'm doing a SearchAsync
call with geo_shape query with pre-indexed Shape to search the points or shapes within / that intersects.
Is there a way to make a search query with geo_shape query
using multiple pre-indexed shapes from multiple documents?
I saw this will require some processing time and power, and won't finish very quickly (real-time-ish).
So, I thought of running some sort of matchmaking process as a background job to search through all the points in shapes (or shapes in shapes) and add those relations in the same index or into an additional index in order to get those points in shapes quicker than using again geo_shape query
.
Still, these running background process, still take a lot of time, and I'm getting some ElasticSearch or HTTP timeouts.
At this public github gist, I provided some parts of my implementation (is not an working example, because I deleted some parts for claritiy and simplicity).
C# ASP .NET 5.0 Console Application ran from Azure VM on 2 tiers:
-
B2ms with 2vCPUs and 8GB RAM
-
D8s_v3 with 8vCPUs and 32GB RAM
ElasticSearch server cluster AZURE.DATA.HIGHCPU.D64SV3 tests made on 2 tiers:
-
1 node with 30GB RAM and 240GB Storage
-
2 nodes with 60GB RAM and 480GB Storage each
ElasticSearch indices are indexed into 10 Primary shards and 1 replica shard.