We occasionally see the following exception. It seems to be dependant on whether the search contains unicode characters or not.
{
"error": {
"root_cause": [
{
"type": "too_complex_to_determinize_exception",
"reason": "too_complex_to_determinize_exception: Determinizing automaton with 18934 states and 27853 transitions would result in more than 10000 states."
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "redacted",
"node": "pxuPrKQMSdq1c1p8Cwrb9Q",
"reason": {
"type": "too_complex_to_determinize_exception",
"reason": "too_complex_to_determinize_exception: Determinizing automaton with 18934 states and 27853 transitions would result in more than 10000 states."
}
}
]
},
"status": 500
}
We limit our search input field to 250 characters. If I search for 250 a it works fine. However, if I use 250 วั we see this exception. Why is this the case? How can I impose a safe limit on my input field if it is dependent on the character encoding?
{
"error": {
"root_cause": [
{
"type": "too_complex_to_determinize_exception",
"reason": "too_complex_to_determinize_exception: Determinizing automaton with 18934 states and 27853 transitions would result in more than 10000 states."
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "article_search_production_v2",
"node": "pxuPrKQMSdq1c1p8Cwrb9Q",
"reason": {
"type": "too_complex_to_determinize_exception",
"reason": "too_complex_to_determinize_exception: Determinizing automaton with 18934 states and 27853 transitions would result in more than 10000 states."
}
}
]
},
"status": 500
}
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.