Geo_shape size and elastic performance

We have an index that contains about 5 million documents. About 20% will have a geo_shape field that contains polygon data. I need to support intersect queries on these geo_shapes. Some of these polygons are very complex (e.g., the coastline of Chile) and have resulted in long indexing time and in some cases instance failure. As a result, we are simplifying these polygons prior to indexing. The question is, what should we target as the maximum number of vertices in our geo_shape polygons? Simplifying the polygon reduces indexing time (and perhaps query time???), but at a cost of detail.

Hey @rgwozdz!

Can you provide the Elasticsearch version you are using as well as the geo_shape mapping? Just want to make sure we know which indexing strategy you are using.

Thanks!

1 Like

Hello @Ignacio_Vera,

We are using v7.9.0. Here is the mapping (I've limited it to the polygon field):

{
  "myindex": {
    "mappings": {
      "dynamic": "false",
      "properties": {
        "enrichments": {
          "dynamic": "false",
          "properties": {
            "datasetBoundary": {
              "dynamic": "false",
              "properties": {
                "center": {
                  "type": "geo_point"
                },
                "geometry": {
                  "type": "geo_shape"
                },
                "size": {
                  "type": "integer"
                }
              }
            }
          }
        }
      }
    }
  }
}

Let me know if there is anything else I can provide.

Thanks @rgwozdz,

The question is, what should we target as the maximum number of vertices in our geo_shape polygons?

This is a hard question as complexity in terms of indexing time not depends only in the number of edges but in the number of holes and the layout of holes. It is easy to prove that a regular polygon with 100K edges an a simple hole is pretty fast, but on the other hand I have seen polygons with 20K edges and almost K holes taking several seconds to process.

My expectation are that Elasticsearch should handle efficiently polygons with edges on the order of magnitude of low tens of thousand, taking into account edges of the holes.

in some cases instance failure

Would it be possible to share any failing instance, I would love to see if this uncover any bug on Lucene's Tessellator.

Thank you for your thoughts. This is very helpful. The polygons we are using in our current project actually have no holes as they are derived as a concave hull around the original dataset (multiple points, lines or polygons).

Unfortunately, the outage we had related to a possibly too-large polygon occurred some time ago and before I was brought on to the project. But I will reach out if I run into anything similar as I move ahead.