What is a Valid Geometry in ES?


(AJ) #1

My question is a repeat of the below, but since there was no response, I had to ask again.

This is really causing us a lot of problems as we simply can't import the spatial data in ES - although we have validated those polygons, they still give errors. Can someone please advise?

One other issue is whether someone knows of a good way to generate GeoJSON files from shapefiles which provide one FeatureCollection for each polygon in the shapefile, rather one FeatureCollection for the whole GeoJSON generated?


(Mark Walkom) #2

There is;
https://www.elastic.co/guide/en/elasticsearch/reference/5.5/geo-point.html
https://www.elastic.co/guide/en/elasticsearch/reference/5.5/geo-shape.html


(Shane Connelly) #3

As you check those out, what you may notice (based upon your post) is that Elasticsearch doesn't support FeatureCollection GeoJSON objects -- you have to break those up into the constituent Polygon/LineString type of shapes first and then index those individually. FWIW, with jq, it's fairly easy to take a top-level FeatureCollection GeoJSON file and produce a set of features, even with filtering. For example, the following takes a speedlimit GeoJSON FeatureCollection file and selects only those with > 0 limits and then puts them in _bulk format while writing them out to a speedlimits_filtered.json file

jq -c '.features[] | select(.properties.speedlimit > 0)' speedlimits.json | sed -e 's/^/{ "index" : { "_index" : "speedlimit", "_type" : "speedlimit" } }/' > speedlimits_filtered.json


(AJ) #4

thanks a lot for this - actually, this is not really the issue here.

By valid geometry, I meant getting errors pertaining to the spatial validity of the shape itself, like 'self-intersection' ones, and getting those errors despite having validated the geometry in ESRI and in PostGIS both.

The second issue which is the worse one is constant 'array out of bounds exception' - i posted another question about this. Some of the geometries I am trying to insert are really big i.e. contains more than 100k vertices, and ElasticSearch is just give me errors. I don't know if there is a restriction or a way to increase on this.

Also, when you use ogr2ogr (GDAL) to insert the geometry i.e. either shapefile or geojson, it does extract only the features and insert them - but i get a lot of "array-out-of-bounds exception" on the client side, with the below correponding on the server:.

 [2017-07-26T12:02:36,395][DEBUG][o.e.a.b.TransportShardBulkAction] [my_node] [index_name][4] failed to execute bulk item (index) BulkShardRequest [[index_name][4]] containing [index {[index_name][FeatureCollection][AV1-j4-ITghY5gkRJfoj], source[n/a, actual length: [4mb], max length: 2kb]}]
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [geometry]
	at org.elasticsearch.index.mapper.GeoShapeFieldMapper.parse(GeoShapeFieldMapper.java:473) ~[elasticsearch-5.4.1.jar:5.4.1]
	at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:450) ~[elasticsearch-5.4.1.jar:5.4.1]
	at org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:467) ~[elasticsearch-5.4.1.jar:5.4.1]
	at org.elasticsearch.index.mapper.DocumentParser.innerParseObject(DocumentParser.java:383) ~[elasticsearch-5.4.1.jar:5.4.1]
	at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrNested(DocumentParser.java:373) ~[elasticsearch-5.4.1.jar:5.4.1]
	at org.elasticsearch.index.mapper.DocumentParser.internalParseDocument(DocumentParser.java:93) ~[elasticsearch-5.4.1.jar:5.4.1]
	at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:66) ~[elasticsearch-5.4.1.jar:5.4.1]
	at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:277) ~[elasticsearch-5.4.1.jar:5.4.1]
	at org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:536) ~[elasticsearch-5.4.1.jar:5.4.1]
	at org.elasticsearch.index.shard.IndexShard.prepareIndexOnPrimary(IndexShard.java:513) ~[elasticsearch-5.4.1.jar:5.4.1]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.prepareIndexOperationOnPrimary(TransportShardBulkAction.java:450) ~[elasticsearch-5.4.1.jar:5.4.1]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.executeIndexRequestOnPrimary(TransportShardBulkAction.java:458) ~[elasticsearch-5.4.1.jar:5.4.1]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:143) [elasticsearch-5.4.1.jar:5.4.1]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:113) [elasticsearch-5.4.1.jar:5.4.1]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:69) [elasticsearch-5.4.1.jar:5.4.1]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:939) [elasticsearch-5.4.1.jar:5.4.1]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:908) [elasticsearch-5.4.1.jar:5.4.1]
	at org.elasticsearch.action.support.replication.ReplicationOperation.execute(ReplicationOperation.java:113) [elasticsearch-5.4.1.jar:5.4.1]

(Shane Connelly) #5

Also, when you use ogr2ogr (GDAL) to insert the geometry i.e. either shapefile or geojson, it does extract only the features and insert them

I've seen ogr2ogr produce a FeatureCollection. Actually, the speedlimit example was from the San Francisco speed limits shapefile which I had used ogr2ogr to convert to GeoJSON.

$ ogr2ogr -f GeoJSON -t_srs crs:84 speedlimits.json sfmta_speedlimits/MTA_DPT_SpeedLimits.shp
$ head speedlimits.json
{
"type": "FeatureCollection",
"crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" } },

"features": [
{ "type": "Feature", "properties": { "cnn": ...

Which shows ogr2ogr producing the FeatureCollection and we need the features individually, hence

jq -c '.features[] | select(.properties.speedlimit > 0)' speedlimits.json | sed -e 's/^/{ "index" : { "_index" : "speedlimit", "_type" : "speedlimit" } }/' > speedlimits_filtered.json
curl -s -XPOST 'http://127.0.0.1:9200/_bulk' --data-binary  @speedlimits_filtered.json

The above works.

As to the very large geometries, 100k vertices is certainly on the higher complexity of uses I've seen. I've personally only indexed shapes with up to about 10k vertices, but I've anecdotally heard people index the entire outlines of countries like the United States and Russia. I will say that your performance with 100k vertices is unlikely to be very good, so I would recommend smoothing them out if you can before indexing. The most common reason for providing such high-vertex counts is to get higher accuracy, but as the page Mark linked to says, Elasticsearch does not provide 100% accuracy on geo shapes anyway, so if smoothing is possible, you'll likely decrease the workload that Elasticsearch has to do without really changing the net effect. I'll also mention that there's some core (known) issues with performance -- especially on very high-complexity shapes here -- in that geo_shapes haven't yet been moved over to our BKD tree structure. Only geo_points have so far. We're working on that.

Anyway, I have a suspicion that your errors have to do with the shape format rather than complexity. Can you confirm that the data you're only inserting is only of the types (e.g. LineString/Polygon) that Mark posted? Also, can you confirm each of the shapes conforms to the specifications on that page? For example, on the page you'll see the following note:

IMPORTANT NOTE: GeoJSON does not mandate a specific order for vertices thus ambiguous polygons around the dateline and poles are possible. To alleviate ambiguity the Open Geospatial Consortium (OGC) Simple Feature > Access specification defines the following vertex ordering:

Outer Ring - Counterclockwise
Inner Ring(s) / Holes - Clockwise
For polygons that do not cross the dateline, vertex order will not matter in Elasticsearch. For polygons that do cross the dateline, Elasticsearch requires vertex ordering to comply with the OGC specification. Otherwise, an unintended polygon may be created and unexpected query/filter results will be returned.

Other shape types have other restrictions on that page, e.g. radius is required for circle shapes, and For all types, both the inner type and coordinates fields are required etc


(AJ) #6

hi shane,
thanks for this - much appreciated.

I have actually used your example to extract a json file, in the _bulk standard, and have tried it. Unfortunately same problem.

Actually, the 100k vertices are the smoothed ones (we have some with 500k vertices), which is what happens when working with coast-lines. And yes, i can confirm that we are inserting Polygons or Multipolygons, which is in the geo_shape specs. The dataset we are working is complex, and I am trying to find a compromise on what would work, after we have simplified, and where it fails, and currently, we are not getting much success with ES with our big geometries.

Also, to re-iterate what I mentioned before, some geometries which have been cleansed/validated both in ArcGIS and PostGIS are still throwing some errors like 'self-intersection' or 'Points of LinearRing do not form a closed Linestring', so it's a bit challenging to go back and 'fix' them. But this is less of a worry, main one is about the big geometries failing.


(Shane Connelly) #7

It sounds like you have several different errors. They all probably need to be treated separately. Points of LinearRing do not form a closed linestring is a separate error from Self-intersection at or near point and those are different yet again from other errors. The fact that you're getting these types of errors may mean that the polygons you're indexing are not fully ISO 19107:2003 valid and while ArcGIS and PostGIS are happily going along with them, Elasticsearch is being more strict with the data. In cases like that, I've heard of people having good success with running the data through prepair. You might want to give that a shot.


(AJ) #8

yes thanks, i'll give it a shot for sure...
what do you reckon for the ArrayOutOfBounds ones - would you be willing to give it a go at some point and see?


(Shane Connelly) #9

Can you post the full stack trace where you're seeing ArrayOutOfBounds and what you're doing that causes it?


(AJ) #10

Sure, I have posted this in the Github repo as a potential bug


(Shane Connelly) #11

Thanks -- we'll look into it.


(AJ) #12

regarding prepair , just to note that their improved branch that supports multipolygons doesn't work with big geometries, it's giving a segmentation fault that was reported but still there...


(Shane Connelly) #13

There may be other tools out there to transform the shapes. prepair is just the one I know and have seen the most success with before. I suppose you could try ST_MakeValid if you have PostGIS, but I'm guessing it'll be slow or blow up, but may be worth a try.


(AJ) #14

st_makevalid is already something we use a lot (i did mention we do validation in postgis already) - some of the geometries still fail in ES


(Shane Connelly) #15

OK, I don't know what to suggest for you then. Maybe somebody else has ideas, but the errors you're getting around "Points of LinearRing do not form a closed linestring" and "Self-intersection at or near point" are indicative of you not having valid ISO19107:2003 shapes. I've seen prepair fix those before. If prepair has a bug, you may need to submit a patch directly to prepair to fix it or find an alternative way of generating/converting valid ISO19107:2003 shapes.


(AJ) #16

hey shane, i am not too worried about these errors, we will most likely find a way to sort these - but thanks a lot for your input on this.
The critical issue for me is the 'ArrayOutOfBounds' type errors - I really hope you/the team could check on this and see whether it's a bug or something that we could find a way around. I have seen others also reporting this before. We are working on simplifying some test cases, but that would take a bit of time before I can report anything back (knowing that we would need to find a compromise on the level of simplification) , just because it does take quite some time to work with these complicated geometries.


(system) #17

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.