6.4 to 6.7.1: MultiPolygon cannot be indexed anymore using new geo_shape field

Hi,
I've migrated my clusters from ES 6.4 to ES 6.7.1.
On these I have some indices which have some geo_shape fields. The mappings were defined using the "old" geo_shape parameters:

{
    "location": {
        "type": "geo_shape",
        "tree": "quadtree",
        "precision": "50.0m"
    }
 }

I migrated all those fields to the "new" geo_shape field format by simply changing them to:

{
    "location": {
        "type": "geo_shape",
    }
 }

All the impacted indices were then re-indexed to the new mappings using the reindex API.
Once the reindex was done I noticed that some documents couldn't be reindexed to the new field.

This can be reproduced on a 6.7 cluster (see this gist for the exact geojson shape):

PUT places-old
{
  "mappings": {
    "_doc": {
      "dynamic": "strict",
      "properties": {
        "location": {
          "type": "geo_shape",
          "tree": "quadtree",
          "precision": "50.0m"
        }
      }
    }
  }
}

POST places-old/_doc
{
  "location": {
    "type": "MultiPolygon",
    "coordinates": [
      [
        [
          [
               //... see linked shape ...//
          ]
       ]
    ]
  }
}

This works without issues. By the way it seems that the standard "upper-case" geometry notation type can now be used, which wasn't the case in ES 6.4 and I can't seem to find something related to this change in the documentation.

{
  "_index" : "places-old",
  "_type" : "_doc",
  "_id" : "TU6zMGoBeEI9SuPC7thc",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

Now trying to index this doc using the new geo_shape field type fails.

PUT places-new
{
  "mappings": {
    "_doc": {
      "dynamic": "strict",
      "properties": {
        "location": {
          "type": "geo_shape"
        }
      }
    }
  }
}

POST places-new/_doc
{
  "location": {
    "type": "MultiPolygon",
    "coordinates": [
      [
        [
          [
               //... see linked shape ...//
          ]
       ]
    ]
  }
}
{
  "error": {
    "root_cause": [
      {
        "type": "mapper_parsing_exception",
        "reason": "failed to parse field [location] of type [geo_shape]"
      }
    ],
    "type": "mapper_parsing_exception",
    "reason": "failed to parse field [location] of type [geo_shape]",
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "Unable to Tessellate shape [[51.6251554, 6.9730244] //...// [51.624605, 6.9736199] ]]. Possible malformed shape detected."
    }
  },
  "status": 400
}

The actual shape looks like this:

Although this shape is quite weird (possibly comming from a bug in my own import scripts), it seems to be a valid geojson (passes the mapbox linter).

I can't seem to find something in the release notes or the github issues that would suggest that It would not be supported anymore by the new geo_shape field. This shape in particular is extracted from openstreetmap see here.

Therefore I'm wondering if this is an actual bug or if it's to be expected.

I index the whole planet-osm data and this issue affects roughly 14% documents (~11M docs).

The Tesselator class used in Lucene to compute the shape has some more requirements than just being valid GeoJSON, see Tessellator (Lucene 8.0.0 API)

To cite

  • No self intersections
  • Holes may only touch at one vertex
  • Polygon must have an area (e.g., no "line" boxes)
  • sensitive to overflow (e.g, subatomic values such as E-200 can cause unexpected behavior)

I haven't looked too deeply at your shape (humans are not the world's best JSON parsers), but maybe one of those criterias is not fulfilled.

--Alex

2 Likes

Hi @spinscale,
Thank you for your answer ! I've been trying to find exactly what is causing the issue (self-intersections or holes etc...) without success. All the geojson linters I could find did not help me. I tried to find a validator or equivalent in the lucene geo package but I could not.

Since some shapes that could be indexed in 6.4 can no longer be in 6.6+ I think this should be mentioned in the geo_shape docs or in the release notes (maybe It's already the case and I missed it).

Hi again,
I've been investigating some more and using mapshapper I've been able to narrow it down a bit.
I found two simplified versions of the original shape, one of them can be indexed correctly the other cannot be indexed. See the following gists:

If I superimpose the two shapes, it looks like this:

The green shape is the one that can be indexed, the red one the "bad" one.

Comparing the two raw JSON files it turns out that the only difference between them is the extra point in the red shape, all other points are exactly the same.

What's disturbing me is that It doesn't seem to fall in the four constraints you mentioned in your first reply. Maybe you have a clearer of idea of what exactly is going on. Is it that the Tesselation ends up creating too small a triangle ? My understanding of the ear-clipping algorithm is shallow...

Thank you for your help.

Hi,

I have been looking into this problem and I think there is nothing wrong with the polygon, it is a bug on the tessellator code. It seems to have problems finding the right tessellation when you have holes touching other edges as in your case.

I have opened a issue in Lucene to improve this situation:

https://issues.apache.org/jira/browse/LUCENE-8775

2 Likes

Hi @Ignacio_Vera,

Thank you for your quick answer, it's a relief. Just by curiosity did you use any tool (that I could use) while digging into this issue ? I've found myself quite helpless trying to understand what was going on.

I'll be watching the issue and the PR and let you know if there are still some shapes that cannot be indexed after the fix has landed.

Hi,

we are working with a patched version of elasticsearch containing this lucene PR: https://issues.apache.org/jira/browse/LUCENE-8775

Sadly, at least one edge case still remains. It can happen with holes in a polygon. It's quite hard to reproduce and seems to be related to the implementation of this paper (fetchHoleBridge method) https://www.geometrictools.com/Documentation/TriangulationByEarClipping.pdf

An example of an invalid shape: https://gist.github.com/clement-tourriere/2a3110946338107b8f52bdcbcb6b8bed

If you remove the hole, or one vertex in this example, everything seems to work fine.

Thanks @clement_tourriere for sharing your case. It seems that there is still some edge case that still fail, in particular when a hole shares a vertex with the polygon as in your case. I will look into it in the next few days.

I updated the lucene PR with a fix for the given shape.

I expect more corner cases. What is clear to me is that the algorithm has problems when a hole has a shared vertex with the polygon. A correct fix will be to merge those holes before hand using the shared vertex. The problem is that doing so in a performant way seems to be challenging.

Thank you very much @Ignacio_Vera.
We will try the new version in the next few days and will tell you if we find other edge cases.

Hello again @Ignacio_Vera.

I want to report you another problem related to the new LatLonShape system. In fact it is not really close related to this problem but it comes from the refactoring you made here: https://github.com/apache/lucene-solr/commit/ce9a8012c080dbf2a96a6755a0b7048ab5739419

You have created the Rectangle2D class to separate query and geometry logics. The problem is the hashCode method in this new class, that calls super.hashCode(). It's the one from Object and it changes for every new Rectangle instantiation (for each query). As a consequence, caching cannot be done for LatLonShapeBoundingBoxQuery since the hashCode is always different.
A simple fix should be to create a classHash for the Rectangle2D instead of using the Object one.

1 Like

Thanks for reporting! I have filed the issue. I decided for a different implementation to keep symmetry with other query implementations:

https://issues.apache.org/jira/browse/LUCENE-8831

Hi @Ignacio_Vera,
Thank you for working on this ! I saw that your changes have been merged on the Lucene side.

By any chance, do you know when they will land in elasticsearch ? (I'm not sure where I could find this info)

We do not disclose information about future releases.

This change needs to be released on Lucene first, but once is released it should not too long afterwards.

Ok ! We'll be waiting ! thank you !

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.