Terms lookup mechanism cause too_many_clauses exception

xavierfacq · November 30, 2017, 2:19pm

Hi guys,

I have a problem with big queries using the Terms lookup mechanism.

@see https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-terms-query.html#query-dsl-terms-lookup

As we have more and more customers (it's a good news) we have more and more values.
I 'm trying to refactor our queries and I found the Terms lookup mechanism, it's very
interesting in our case. BUT, the "lookup" values may be bigger, the max size actually is 118195 keys for 1 user.

So when I add the terms query, I have the following error (that's normal I understand)
Caused by: NotSerializableExceptionWrapper[too_many_clauses: maxClauseCount is set to 1024]

If I change the configuration a big size it's works OK, but is it a good solution ? What problem we can have ?
Some of you have a hight value for index.query.bool.max_clause_count: 100000 ?

bye,
Xavier

Christian_Dahlqvist · November 30, 2017, 2:56pm

What is the problem you are solving with all these terms? What does the terms represent?

xavierfacq · November 30, 2017, 3:13pm

Hi Christian,

Here is the "project" :
Let's say we have products in a index, and imagine that customers can "subcribe" to products.
A customer can search his products, so I use the Terms lookup mechanism as a Join between products
and user selections. The problem is that some users have suscribed to more than 1024 products.
If it was a super simple query, I would run multiple queries with splitted subscriptions, but the exisiting query
is already so big and complex that it's not possible to do that.

NOTE: The solution we have actually is that the product index has a field with userid, but I 'm looking for another architecture.

Xavier

Christian_Dahlqvist · November 30, 2017, 3:22pm

Maybe you could use parent-child to maintain the relationship. Let the product document be the parent and create a child document per user that subscribed to this product. Add a child to subscribe and remove it to unsubscribe.

xavierfacq · November 30, 2017, 3:30pm

I have questions about this solution:
First, you need to know that we run full reindexations and index rolling every weekend.

1°/ would you create a child document per user in the product index or into another. If you mean into the product, it's the same solution we already have , and that's why we want to change (fullreindex is to long because the query is superrrrrr long)

2°/ Is it a problem to have a parent-child relationchip between indices that are dropped and reindexed ? (But they are knowned by a fix alias)

thx !

Christian_Dahlqvist · November 30, 2017, 3:32pm

Well, that is new information. Why are you reindexing and rolling indices every weekend?

Parent-child requires it to be in the same index.

The parent and all associated children must reside in the shard.

xavierfacq · November 30, 2017, 3:36pm

Yes that's what I just read, and it's not possible

I'll search another solution because setting index.query.bool.max_clause_count: 100000
is probably not a solution...

xavierfacq · December 1, 2017, 8:08am

@Christian_Dahlqvist Do you have a recommendation about the max limit for this field ? It's Long ids in our case... And the biggeste array is about 118K :-/

Christian_Dahlqvist · December 1, 2017, 8:12am

No, I have never tried anything at that scale.

How come you are reindexing and rolling indices every weekend? Is there no way to enhance this process to allow a different solution to the problem?

xavierfacq · December 1, 2017, 8:20am

Ok, thank you.

Reindexing are done every weekend (and partial in the night) because there are changes not synchronized from our MySQL database. For old crappy reasons

thx,
Xavier

xavierfacq · December 1, 2017, 12:26pm

FYI: https://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_a_TooManyClauses_exception.3F

Increase the number of terms using BooleanQuery.setMaxClauseCount(). Note that this will increase the memory requirements for searches that expand to many terms. To deactivate any limits, use BooleanQuery.setMaxClauseCount(Integer.MAX_VALUE).

jpountz · December 1, 2017, 2:24pm

I would discourage to increase the maximum number of clauses, it tends to be a source of problem as Lucene may need to read from {max_clause_count} locations on the disk in parallel.

In general, terms queries are only subject to the maximum clause count if their score is required. So I'm wondering that you could work around your issue by just putting your terms query in a filter context, such as under a constant_score query or in a bool filter clause (assuming that you don't need scores for that query)?

xavierfacq · December 1, 2017, 3:08pm

You are right, changing a query terms into a bool.filter.terms does not throw a too_many_clauses exception anymore.

(Having an index with a doc and a field "concurrent_company_ids" > 1024 values)
This cause an exception:

  {
    "from": 0,
    "size": 10,
    "query": {
      "terms": {
	"external_company_id": {
	  "index": "index-user-territory",
	  "type": "user_territory",
	  "id": "85",
	  "path": "concurrent_company_ids"
	}
      }
    }
  }

This one doesn't:

  {
    "from": 0,
    "size": 10,
    "query": {
      "bool": {
	"filter": [
	  {
	    "terms": {
	      "external_company_id": {
		"index": "index-user-territory",
		"type": "user_territory",
		"id": "85",
		"path": "concurrent_company_ids"
	      }
	    }
	  }
	]
      }
    }
  }

Now, I have to try to change all queries to filtered or constant_score... it's going to be tough

jpountz · December 1, 2017, 4:55pm

Cool, thanks for bringing closure!

xavierfacq · December 11, 2017, 3:11pm

Note that it also works when the terms lookup is encapsulated into a constant_score query.

thx,
Xavier

system · January 8, 2018, 3:12pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
TermsLookup Error too_many_clauses: maxClauseCount is set to 1024 Elasticsearch	1	752	October 19, 2017
Elasticsearch 6.x too_many_clauses error Elasticsearch	2	3107	February 8, 2018
Tweaking query to avoid "too_many_clauses" Elasticsearch	1	1072	June 30, 2017
Too many clauses exceptions Elasticsearch	4	3244	December 12, 2018
Terms Lookup on Ip fields and the too_many_clauses: maxClauseCount issue Elasticsearch	1	256	May 12, 2021

Terms lookup mechanism cause too_many_clauses exception

Related topics