Percolation with document containing huge geo_shape

Hello, i have strange error when using percolation with a document containing a huge geo_shape property.
I have an index containing documents (dt_prior_lake_database_continent_basin). These documents have a mapped geo_shape property (granules.basin.placemark).
I have another index containing search requests (sr_prior_lake_database_continent_basin) for this documents (it contains one query per workflow, a workflow is another document).
This index contains same mappings as "dt_" one with one property query mapped as percolator.
When i make a search into "sr_" index giving a "dt_" id containing a huge geo_shape (it's a huge multi-polygon badly constructed with a lot of polygons/rectangles covering each others that i cannot actually simplify due to its complexity), i get an error. I haven't this error when i am using documents without such complex geo_shapes and neither if a suppress geo_shape mapping.
I get this error even if "sr_" index is empty.
So, here is the search request:

GET sr_prior_lake_database_continent_basin_test/_search
{
  "size": 10000,
  "query": {
    "constant_score": {
      "filter": {
        "percolate": {
          "document_type": null,
          "field": "query",
          "index": "dt_prior_lake_database_continent_basin",
          "id": "SWOT_PLD_GR_01_20000101T000000_20991231T235959_20210616T143757_v001.sqlite"
        }
      }
    }
  }
}

Here is the error:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "query_shard_exception",
        "reason" : "failed to create query: bytes can be at most 32766 in length; got 162990",
        "index_uuid" : "DRXusyirQW26h2xAeY-hiQ",
        "index" : "sr_prior_lake_database_continent_basin_test"
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "sr_prior_lake_database_continent_basin_test",
        "node" : "gl2TPy5SQd263YOIx4UDZw",
        "reason" : {
          "type" : "query_shard_exception",
          "reason" : "failed to create query: bytes can be at most 32766 in length; got 162990",
          "index_uuid" : "DRXusyirQW26h2xAeY-hiQ",
          "index" : "sr_prior_lake_database_continent_basin_test",
          "caused_by" : {
            "type" : "max_bytes_length_exceeded_exception",
            "reason" : "bytes can be at most 32766 in length; got 162990"
          }
        }
      }
    ]
  },
  "status" : 400
}

Here is "sr_" mapping:

PUT sr_prior_lake_database_continent_basin_test 
{
  "mappings" : {
    "properties" : {
        "granules" : {
          "properties" : {
            "basin" : {
              "properties" : {
                "placemark": {
                  "type": "geo_shape"
                }
              }
            }
          }
        },
        "query" : {
          "type" : "percolator"
        }
      }
  }
}

And finally, here is the document:

{
  "_index" : "dt_prior_lake_database_continent_basin",
  "_type" : "_doc",
  "_id" : "SWOT_PLD_SA_02_20000101T000000_20991231T235959_20210616T143757_v001.sqlite",
  "_version" : 12,
  "_seq_no" : 71,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "dataType" : "prior_lake_database_continent_basin",
    "id" : "SWOT_PLD_SA_02_20000101T000000_20991231T235959_20210616T143757_v001.sqlite",
    "uri" : "file:/work/ALT/ssalto_dev/chronos/swot/chronos-valid-dev2/input/SWOT_PLD_SA_02_20000101T000000_20991231T235959_20210616T143757_v001.sqlite",
    "valid" : true,
    "granules" : {
      "basin" : {
        "type" : "BASIN",
        "id" : "CT6B2",
        "parentId" : "CT6",
        "ascendIds" : null,
        "previousId" : null,
        "nextId" : null,
        "toBeProcessed" : true,
        "@ingestDate" : "2022-01-11T15:17:20Z",
        "placemark" : {
          "type" : "multipolygon",
          "coordinates" : 
            [[[[ a lot.... but too much for permitting topic creation ]]]]
        },
        "continentCode" : "SA"
      },
      "segment" : {
        "type" : "SEGMENT",
        "id" : "2010-01-01T00:00:00Z_2050-01-01T00:00:00Z",
        "toBeProcessed" : true,
        "segmentEndTime" : "2050-01-01T00:00:00Z",
        "segmentStartTime" : "2010-01-01T00:00:00Z"
      }
    },
    "nature" : "EXTERNAL",
    "manuallyImported" : false,
    "@ingestDate" : "2022-01-13T14:07:49Z",
    "@chronosInstance" : "SWOT",
    "creationTime" : "2021-06-16T14:37:57.000000Z",
    "version" : 1
  }
}

It seems that the document doesn't map geo_shape property using a geo_shape but moreover a keyword or something like that doesn't it ?

I have simplified all multi-polygons to have correct ones (without overriden polygons) and, on some huge geo_shape (for example, a continent). I still have the same error.
Does someone could confirm that it is a confirmed bug ?
It is just to avoid me to create a bug on Elasticsearch if it isn't one.
I can bring the json of problematic geometry

See https://github.com/elastic/elasticsearch/issues/83418.

There was a bug on Lucene MemoryIndex that was imposing a maximum size to binary doc values. This has been fixed upstream and it won be released in Lucene 9.1.

For the time being the work around is to disable the binary doc values, in this case by setting doc_values : false to the geo_shape field.

Yes, i saw the bug state into Elatsicsearch.
Thank you for the quick response. Setting doc_values to false effectively suppressed the error.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.