BooleanQuery$TooManyClauses with synonym filter

tomoko · October 15, 2019, 11:40am

How can I avoid TooManyCaluses failure when I use kuromoji tokenizer with the synonym filter with specifying synonym dictionary which contains about 500K words? How can I calculate to find an appropriate value as the max clause count? Or is there any other way not to cause this failure other than increasing max_clause_count value with using this synonym dictionary?

Situation: Simple Query String fails with TooManyClauses for a word. I can avoid this failure for a word with configuring max_clause_count to 153600, but it occurs again for another word, which seems to need more.

Additional information: I'm using kuromoji tokenizer with synonym filter with synonym dictionary which contains about 500K words. It does not fail without synonym filter even when I query the same word. It does not fail with synonym filter with another small synonym dictionary, either.

Environment:
Elasticsearch v5.6.8
/etc/elasticsearch/elasticsearch.yml
indices.query.bool.max_clause_count: 153600

Query body:
{
"query": {
"simple_query_string": {
"query": "atext",
"fields": ["field1"],
}
}
}

Error log: /var/log/elasticsearch/elasticsearch.log
org.elasticsearch.index.query.QueryShardException: failed to create query: {
"simple_query_string" : {
"query" : "atext",
"fields" : [
"field1^1.0"
],
"flags" : -1,
"default_operator" : "or",
"lenient" : false,
"analyze_wildcard" : false,
"boost" : 1.0
}
}
Caused by: org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is set to 153600

cbuescher · October 16, 2019, 7:53am

Do you know the word the query fails for? Is it just one token? Can you check what it analyzes to if you use the "_analyze" endpoint for it, using the same analyzer you have configured for that target field?
For further ideas and for others to chime in on your question it would also be helpful to see the analysis chain and the mapping of your index. It would also be interesting to know if you observe the same behaviour with a more recent version of ES than 5.6.

tomoko · October 17, 2019, 5:37am

Hi, cbuescher. Thank you for your support. Here are answers.

Elasticsearch analyzes the word into 5097 tokens with using the same analyzer configured for that target field. The word is Japanese.

data = es.indices.analyze(
index=esIndex,
body={
"analyzer": my_analyzer,
"text": "atext"}
)
numoftoken = len(data.get("tokens"))

When max_clause_count is 102400,
word.A : 5097 tokens -> TooManyClause error
word.B : 716 tokens -> TooManyClause error
word.C : 4691 tokens -> OK

When max_clause_count is 153600,
word.A : 5097 tokens -> TooManyClause error
word.B : 716 tokens -> OK
word.C : 4691 tokens -> OK

Here is the mapping of the index.
setting = {
"settings": {
"index" : { "number_of_shards": 1 },
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom", "tokenizer": "search-kuromoji", "filter" : [ "synonym", "greek_lowercase", "katakana_readingform"]}
},
"tokenizer": {
"search-kuromoji": {"type": "kuromoji_tokenizer", "mode": "search"}
},
"filter": {
"synonym": {"type": "synonym", "synonyms_path": SYNONYMS_PATH},
"greek_lowercase": {"type": "lowercase", "language": "greek"},
"katakana_readingform": {"type": "kuromoji_readingform", "use_romaji" : False}
}
}
}
}
mappings = {
'symptom': {
'properties': {
'field1': {'type': 'text', 'index': 'true', 'analyzer': 'my_analyzer'},
'code': {'type': 'text'}
}
}
}

es.indices.create(index=an_index, body=setting, request_timeout=30)
es.indices.put_mapping(index=index, doc_type=a_doctype, body=mappings)

system · November 14, 2019, 5:38am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Synonym filterを使用していて TooManyClausesを回避するには日本語による質問・議論はこちら	1	868	November 12, 2019
BooleanQuery$TooManyClauses - buyt have updated config file to increase it Elasticsearch	7	1541	April 2, 2019
Failed to create query - maxClauseCount is set to 1024 Elasticsearch	4	8734	May 20, 2022
Trying to understand `max_clause_count` Elasticsearch	1	408	October 5, 2020
TooManyClauses[maxClauseCount is set to 1024] Elasticsearch	3	4045	July 6, 2017

BooleanQuery$TooManyClauses with synonym filter

Related topics