We occasionally see the following exception. It seems to be dependant on whether the search contains unicode characters or not.
{
"error": {
"root_cause": [
{
"type": "too_complex_to_determinize_exception",
"reason": "too_complex_to_determinize_exception: Determinizing automaton with 18934 states and 27853 transitions would result in more than 10000 states."
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "redacted",
"node": "pxuPrKQMSdq1c1p8Cwrb9Q",
"reason": {
"type": "too_complex_to_determinize_exception",
"reason": "too_complex_to_determinize_exception: Determinizing automaton with 18934 states and 27853 transitions would result in more than 10000 states."
}
}
]
},
"status": 500
}
We limit our search input field to 250 characters. If I search for 250 a it works fine. However, if I use 250 วั we see this exception. Why is this the case? How can I impose a safe limit on my input field if it is dependent on the character encoding?
Here is a more specific example, this time it fails with even fewer characters!
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"fields": [
"body_text.analyzed_unigram"
],
"query": "วัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวัวั",
"operator": "or",
"fuzziness": "auto",
"prefix_length": 3
}
}
]
}
}
}
{
"error": {
"root_cause": [
{
"type": "too_complex_to_determinize_exception",
"reason": "too_complex_to_determinize_exception: Determinizing automaton with 18934 states and 27853 transitions would result in more than 10000 states."
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "article_search_production_v2",
"node": "pxuPrKQMSdq1c1p8Cwrb9Q",
"reason": {
"type": "too_complex_to_determinize_exception",
"reason": "too_complex_to_determinize_exception: Determinizing automaton with 18934 states and 27853 transitions would result in more than 10000 states."
}
}
]
},
"status": 500
}
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"fields": [
"body_text.analyzed_unigram"
],
"query": "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
"operator": "or",
"fuzziness": "auto",
"prefix_length": 3
}
}
]
}
}
}
{"took":6,"timed_out":false,"_shards":{"total":2,"successful":2,"skipped":0,"failed":0},"hits":{"total":202,"max_score":0.0,"hits":[]...