Fuzzy search "too complex to determinize exception" with unicode characters

We occasionally see the following exception. It seems to be dependant on whether the search contains unicode characters or not.

{
  "error": {
    "root_cause": [
      {
        "type": "too_complex_to_determinize_exception",
        "reason": "too_complex_to_determinize_exception: Determinizing automaton with 18934 states and 27853 transitions would result in more than 10000 states."
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "redacted",
        "node": "pxuPrKQMSdq1c1p8Cwrb9Q",
        "reason": {
          "type": "too_complex_to_determinize_exception",
          "reason": "too_complex_to_determinize_exception: Determinizing automaton with 18934 states and 27853 transitions would result in more than 10000 states."
        }
      }
    ]
  },
  "status": 500
}

We limit our search input field to 250 characters. If I search for 250 a it works fine. However, if I use 250 วั we see this exception. Why is this the case? How can I impose a safe limit on my input field if it is dependent on the character encoding?

Here is a more specific example, this time it fails with even fewer characters!

{
  "query": {
    "bool": {
      "should": [
        {
          "multi_match": {
            "fields": [
              "body_text.analyzed_unigram"
            ],
            "query": "วัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวั",
            "operator": "or",
            "fuzziness": "auto",
            "prefix_length": 3
          }
        }
      ]
    }
  }
}
{
  "error": {
    "root_cause": [
      {
        "type": "too_complex_to_determinize_exception",
        "reason": "too_complex_to_determinize_exception: Determinizing automaton with 18934 states and 27853 transitions would result in more than 10000 states."
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "article_search_production_v2",
        "node": "pxuPrKQMSdq1c1p8Cwrb9Q",
        "reason": {
          "type": "too_complex_to_determinize_exception",
          "reason": "too_complex_to_determinize_exception: Determinizing automaton with 18934 states and 27853 transitions would result in more than 10000 states."
        }
      }
    ]
  },
  "status": 500
}
{
  "query": {
    "bool": {
      "should": [
        {
          "multi_match": {
            "fields": [
              "body_text.analyzed_unigram"
            ],
            "query": "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
            "operator": "or",
            "fuzziness": "auto",
            "prefix_length": 3
          }
        }
      ]
    }
  }
}
{"took":6,"timed_out":false,"_shards":{"total":2,"successful":2,"skipped":0,"failed":0},"hits":{"total":202,"max_score":0.0,"hits":[]...

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.