Ignore_malformed broken for geo_shape

I have set mappings for my index as such:

"geometry": {
    "type": "geo_shape",
    "ignore_malformed": true
}

When I Bulk POST documents, 1 fails due to malformed geometry. I have set the "ignore_malformed" parameter to true as mentioned in the docs.

Well I still get an exception thrown:

elasticsearch.helpers.errors.BulkIndexError: ('1 document(s) failed to index.', [{'index': {'_index': 'buildings', '_type': '_doc', '_id': 'VPSlEX0BWuwsi5mPMOb1', 'status': 400, 'error': {'type': 'mapper_parsing_exception', 'reason': 'failed to parse', 'caused_by': {'type': 'invalid_shape_exception', 'reason': 'Cannot determine orientation: signed area equal to 0. Points are collinear or polygon self-intersects.'}}, 'data': {'fid': 1761, 'OBJECTID': 280, 'Latitude': etc; etc

What does the document that failed look like?

{"fid": 1761, "OBJECTID": 280, "Latitude": -####, "Longitude": ####, "searchable_building": "Building Name", "geometry": {"type": "MultiPolygon", "coordinates": [[[[#####]]]]}}

Sorry I can't post the actual coordinates etc it’s sensitive data, but nothing stands out to me about the geoshape

I figure this may be due to the Python library having integration issues with Elasticsearch. By default I'd expect having the mapping "ignore_malformed: true" to keep bulk indexing even when hitting fields with geo_shape errors.

But it seems in the Python library an exception is raised by default, crashing your program, regardless of your mapping. Which is counter intuitive.

I've circumvented this issue by including the parameter: "raise_on_error=False" in my Python code. Runs to completion now even with issue documents.

Eg:

helper.bulk(client=ES,  actions=actions, raise_on_error=False)

Just to add a little bit of detail.

In the ingest pipeline that ignore malformed means continue processing the ingest pipeline there could be many steps after that it means don't fail on that step And stop the whole processing.

Ingest pipelines are pre-writing if the actual document into the index.

When the document is actually written to the index is where a mapping exception takes place.

My suspicion is what's happening is it's malformed in such a way that when it went to Write the actual document there was a mapping exception which I think you may already have figured out I just thought I would remind anyone reading this that pipelines run before writing and mapping exceptions happen on writing.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.