Too complex to determinize exception: "Determinizing automaton with 13186 states and 23192 transitions would result in more than 10000 states"

Hi, there. I perform searching in the index that contains several documents with long string inside (about 3 thousand chars). And get the exception if my query contains symbols of the Cyrillic and Latin symbols.
Query:

_search
{
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "should": [
              {
                "wildcard": {
                  "packageContent.raw": {
                    "value": "*husbands ask repeated resolved but laughter debating. she end cordial visitor noisier fat subject general picture. or if offering confined entrance no. nay rapturous him see something residence. highly talked do so vulgar. her use behaved spirits and natural attempt say feeling. exquisite mr incommode immediate he something ourselves it of. law conduct yet chiefly beloved examine village proceed. an so vulgar to on points wanted. not rapturous resolving continued household nsпппппппппппппппппппп*"
                  }
                }
              }
            ]
          }
        }
      ]
    }
  },
  "from": 0,
  "size": 50,
  "_source": [
    "packageContent"
  ]
}

Do you have any idea what is the root cause?

This is a side effect of the internal implementation of wildards as NFAs that are determinized to DFAs. The determinization process can take a up a crazy amount of memory so we limit the size of the determinized DFAs so we don't fill up memory.

I consider this a limitation of wildcard queries though I don't think it is one we're likely to remove any time soon.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.