How can I avoid TooManyCaluses failure when I use kuromoji tokenizer with the synonym filter with specifying synonym dictionary which contains about 500K words? How can I calculate to find an appropriate value as the max clause count? Or is there any other way not to cause this failure other than increasing max_clause_count value with using this synonym dictionary?
Situation: Simple Query String fails with TooManyClauses for a word. I can avoid this failure for a word with configuring max_clause_count to 153600, but it occurs again for another word, which seems to need more.
Additional information: I'm using kuromoji tokenizer with synonym filter with synonym dictionary which contains about 500K words. It does not fail without synonym filter even when I query the same word. It does not fail with synonym filter with another small synonym dictionary, either.
Do you know the word the query fails for? Is it just one token? Can you check what it analyzes to if you use the "_analyze" endpoint for it, using the same analyzer you have configured for that target field?
For further ideas and for others to chime in on your question it would also be helpful to see the analysis chain and the mapping of your index. It would also be interesting to know if you observe the same behaviour with a more recent version of ES than 5.6.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.