Also, when you use ogr2ogr (GDAL) to insert the geometry i.e. either shapefile or geojson, it does extract only the features and insert them
I've seen ogr2ogr produce a FeatureCollection
. Actually, the speedlimit example was from the San Francisco speed limits shapefile which I had used ogr2ogr to convert to GeoJSON.
$ ogr2ogr -f GeoJSON -t_srs crs:84 speedlimits.json sfmta_speedlimits/MTA_DPT_SpeedLimits.shp
$ head speedlimits.json
{
"type": "FeatureCollection",
"crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" } },
"features": [
{ "type": "Feature", "properties": { "cnn": ...
Which shows ogr2ogr producing the FeatureCollection and we need the features individually, hence
jq -c '.features[] | select(.properties.speedlimit > 0)' speedlimits.json | sed -e 's/^/{ "index" : { "_index" : "speedlimit", "_type" : "speedlimit" } }/' > speedlimits_filtered.json
curl -s -XPOST 'http://127.0.0.1:9200/_bulk' --data-binary @speedlimits_filtered.json
The above works.
As to the very large geometries, 100k vertices is certainly on the higher complexity of uses I've seen. I've personally only indexed shapes with up to about 10k vertices, but I've anecdotally heard people index the entire outlines of countries like the United States and Russia. I will say that your performance with 100k vertices is unlikely to be very good, so I would recommend smoothing them out if you can before indexing. The most common reason for providing such high-vertex counts is to get higher accuracy, but as the page Mark linked to says, Elasticsearch does not provide 100% accuracy on geo shapes anyway, so if smoothing is possible, you'll likely decrease the workload that Elasticsearch has to do without really changing the net effect. I'll also mention that there's some core (known) issues with performance -- especially on very high-complexity shapes here -- in that geo_shapes haven't yet been moved over to our BKD tree structure. Only geo_points have so far. We're working on that.
Anyway, I have a suspicion that your errors have to do with the shape format rather than complexity. Can you confirm that the data you're only inserting is only of the types (e.g. LineString/Polygon) that Mark posted? Also, can you confirm each of the shapes conforms to the specifications on that page? For example, on the page you'll see the following note:
IMPORTANT NOTE: GeoJSON does not mandate a specific order for vertices thus ambiguous polygons around the dateline and poles are possible. To alleviate ambiguity the Open Geospatial Consortium (OGC) Simple Feature > Access specification defines the following vertex ordering:
Outer Ring - Counterclockwise
Inner Ring(s) / Holes - Clockwise
For polygons that do not cross the dateline, vertex order will not matter in Elasticsearch. For polygons that do cross the dateline, Elasticsearch requires vertex ordering to comply with the OGC specification. Otherwise, an unintended polygon may be created and unexpected query/filter results will be returned.
Other shape types have other restrictions on that page, e.g. radius
is required for circle
shapes, and For all types, both the inner type and coordinates fields are required
etc