Hi @davidb1 I think the issue is in Elasticsearch not being able to parse the multigeometries of your file that have several holes.
I tried in different ways to clean/validate your dataset and upload. Finally, using ogr2ogr (more info) I managed to get a bit more of info from the error returned by the tool
...
{
"type":"mapper_parsing_exception",
"reason":"failed to parse field [geometry] of type [geo_shape]",
"caused_by":{
"type":"invalid_shape_exception",
"reason":"Invalid shape: Hole is not within polygon"}}}
...
Can I ask what's the use case for a dataset like this?
Note I moved this topic to the Kibana forum but I'm moving it back, since this is apparently more an issue for the Elasticsearch team, as the issue appeared not only indexing from kibana but also from ogr2ogr.
Thanks for explaining the use case. I'd suggest looking for a land polygon and do the opposite query to find points that don't intersect in documents in your indices. That will reduce enormously the possibility of getting hit by complex geometries.
Still, for good performance, you need your land/water polygons to be split so the spatial index picks smaller geometries to try against, otherwise, your queries will be really slow.
You may want to import more detailed datasets than the well known Natural Earth. The German OSM community maintains very detailed land/water datasets. These are large files, so uploading with Kibana is not an option. I can suggest using the procedure detailed on this blog post.
Beware that even the simplified dataset for the land polygons is going to result in a 3GB index in your cluster with 694381 documents.
Example of the precision of the simplified dataset for the west coast of Scotland.
Forgot to mention, and it's actually relevant, that you can only run this kind of analysis on Elasticsearch using the enrich processor since Elasticsearch geoqueries only support to point to a single document with the Preindexed Shape option.
With the enrich processor you could point to your land polygons index as reference data and use the DISJOINT spatial relation to mark the documents that don't intersect to any document of your reference polygons. That way, later at query time, you only need to filter by that field.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.