Length filter: array index out of bounds exception

jetnet · March 26, 2020, 5:51pm

I encountered a strange issue, when a query fails depending on the position of a term and only then the length filter is active (see below):

works:

GET test/_search?filter_path=**.productNumber
{
  "query": {
    "match": {
      "productNumber": {
        "query": "abc def ghij 3d"
      }
    }
  }
}

fails

GET test/_search?filter_path=**.productNumber
{
  "query": {
    "match": {
      "productNumber": {
        "query": "abc def 3d ghij"
      }
    }
  }
}

Exception:

          "caused_by" : {
            "type" : "array_index_out_of_bounds_exception",
            "reason" : "Index 0 out of bounds for length 0"
          }

ES versions tested: 7.5, 7.6

Index settings:

PUT test
{
  "settings": {
    "number_of_shards": "1",
    "number_of_replicas": "0",
    "analysis": {
      "filter": {
        "length_min_2": {
          "type": "length",
          "min": 2
        },
        "word_split_product_number": {
          "type": "word_delimiter_graph",
          "split_on_numerics": true,
          "generate_number_parts": true,
          "catenate_words": true,
          "catenate_numbers": true,
          "catenate_all": true,
          "preserve_original": true
        }
      },
      "analyzer": {
        "word_split_product_number_analyzer": {
          "filter": [
            "lowercase",
            "word_split_product_number",
            "length_min_2"
          ],
          "tokenizer": "standard"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "productNumber": {
        "type": "text",
        "analyzer": "word_split_product_number_analyzer"
      }
    }
  }
}

Test docs:

PUT test/_bulk
{"index":{}}
{"productNumber":"ABC-DEF-GHIJ-3A"}
{"index":{}}
{"productNumber":"ABC-DEF-GHIJ-3B"}
{"index":{}}
{"productNumber":"ABC-DEF-GHIJ-3C"}
{"index":{}}
{"productNumber":"ABC-DEF-GHIJ-3D"}

jetnet · March 26, 2020, 9:11pm

I found the following solution as workaround, but it'd be great to have the length filter working too:
instead of:

        "length_min_2": {
          "type": "length",
          "min": 2
        },

use those:

        "stop_empty": {
          "type": "stop",
          "stopwords": [ "" ]
        },
        "pattern_length_min_2": {
          "type": "pattern_replace",
          "pattern": "^.$",
          "replacement": ""
        },

...
      "analyzer": {
        "word_split_product_number_analyzer": {
          "filter": [
            "lowercase"
            ,"word_split_product_number"
            ,"pattern_length_min_2"
            ,"stop_empty"
            ,"unique"
          ],
          "tokenizer": "whitespace"
        }
      }

jetnet · March 27, 2020, 6:52am

The "workaround" above does not work today
I'm getting the same error as above. It seems I made some mistakes during testing...

UPDATE:
today's workaround: use the combination word_delimiter (not graph!) + flatten_graph
So, a bug in a graph token stream?

jetnet · March 30, 2020, 2:51pm

Any feedback from the ES engineers? Thanks!

spinscale · March 30, 2020, 3:08pm

I opened https://github.com/elastic/elasticsearch/issues/54434 - as the least thing that should happen is either a proper exception or a fix

system · April 27, 2020, 3:08pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Index 256 out of bounds for length 256 Elasticsearch	8	338	September 14, 2023
Index out of bound exception on aggregation Elasticsearch	2	1060	June 1, 2022
Highlighting with terms query StringIndexOutOfBoundsException on 1 shard Elasticsearch	1	478	September 14, 2012
Index out of bounds error after upgrade from 5.6.x to 6.4.0 Elasticsearch	1	479	October 2, 2018
Exception from one shard - ArrayIndexOutOfBoundsException Elasticsearch	6	453	July 6, 2017

Length filter: array index out of bounds exception

Related topics