Fuzzy search "too complex to determinize exception" with unicode characters

CathalCoffey · February 18, 2019, 5:21pm

We occasionally see the following exception. It seems to be dependant on whether the search contains unicode characters or not.

{
  "error": {
    "root_cause": [
      {
        "type": "too_complex_to_determinize_exception",
        "reason": "too_complex_to_determinize_exception: Determinizing automaton with 18934 states and 27853 transitions would result in more than 10000 states."
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "redacted",
        "node": "pxuPrKQMSdq1c1p8Cwrb9Q",
        "reason": {
          "type": "too_complex_to_determinize_exception",
          "reason": "too_complex_to_determinize_exception: Determinizing automaton with 18934 states and 27853 transitions would result in more than 10000 states."
        }
      }
    ]
  },
  "status": 500
}

We limit our search input field to 250 characters. If I search for 250 a it works fine. However, if I use 250 วั we see this exception. Why is this the case? How can I impose a safe limit on my input field if it is dependent on the character encoding?

CathalCoffey · February 19, 2019, 10:48am

Here is a more specific example, this time it fails with even fewer characters!

{
  "query": {
    "bool": {
      "should": [
        {
          "multi_match": {
            "fields": [
              "body_text.analyzed_unigram"
            ],
            "query": "วัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวั",
            "operator": "or",
            "fuzziness": "auto",
            "prefix_length": 3
          }
        }
      ]
    }
  }
}

{
  "error": {
    "root_cause": [
      {
        "type": "too_complex_to_determinize_exception",
        "reason": "too_complex_to_determinize_exception: Determinizing automaton with 18934 states and 27853 transitions would result in more than 10000 states."
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "article_search_production_v2",
        "node": "pxuPrKQMSdq1c1p8Cwrb9Q",
        "reason": {
          "type": "too_complex_to_determinize_exception",
          "reason": "too_complex_to_determinize_exception: Determinizing automaton with 18934 states and 27853 transitions would result in more than 10000 states."
        }
      }
    ]
  },
  "status": 500
}

{
  "query": {
    "bool": {
      "should": [
        {
          "multi_match": {
            "fields": [
              "body_text.analyzed_unigram"
            ],
            "query": "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
            "operator": "or",
            "fuzziness": "auto",
            "prefix_length": 3
          }
        }
      ]
    }
  }
}

{"took":6,"timed_out":false,"_shards":{"total":2,"successful":2,"skipped":0,"failed":0},"hits":{"total":202,"max_score":0.0,"hits":[]...

system · March 19, 2019, 10:48am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Fuzz search thrown too complex to determinize exception Elasticsearch	5	8245	March 17, 2018
Too complex to determinize exception: "Determinizing automaton with 13186 states and 23192 transitions would result in more than 10000 states" Elasticsearch	2	8554	May 8, 2017
Elastic Search for misspelled words Elasticsearch	15	12013	July 6, 2017
Why do non ASCII characters cause problems in elastic search? Elasticsearch	1	621	January 10, 2022
Possible bug with query_string, no source stored, German umlaut and wildcard Elasticsearch	7	859	July 6, 2017

Fuzzy search "too complex to determinize exception" with unicode characters

Related topics