Terms lookup mechanism cause too_many_clauses exception

Hi guys,

I have a problem with big queries using the Terms lookup mechanism.

@see https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-terms-query.html#query-dsl-terms-lookup

As we have more and more customers (it's a good news) we have more and more values.
I 'm trying to refactor our queries and I found the Terms lookup mechanism, it's very
interesting in our case. BUT, the "lookup" values may be bigger, the max size actually is 118195 keys for 1 user.

So when I add the terms query, I have the following error (that's normal I understand)
Caused by: NotSerializableExceptionWrapper[too_many_clauses: maxClauseCount is set to 1024]

If I change the configuration a big size it's works OK, but is it a good solution ? What problem we can have ?
Some of you have a hight value for index.query.bool.max_clause_count: 100000 ?

bye,
Xavier

What is the problem you are solving with all these terms? What does the terms represent?

Hi Christian,

Here is the "project" :
Let's say we have products in a index, and imagine that customers can "subcribe" to products.
A customer can search his products, so I use the Terms lookup mechanism as a Join between products
and user selections. The problem is that some users have suscribed to more than 1024 products.
If it was a super simple query, I would run multiple queries with splitted subscriptions, but the exisiting query
is already so big and complex that it's not possible to do that.

NOTE: The solution we have actually is that the product index has a field with userid, but I 'm looking for another architecture.

Xavier

Maybe you could use parent-child to maintain the relationship. Let the product document be the parent and create a child document per user that subscribed to this product. Add a child to subscribe and remove it to unsubscribe.

I have questions about this solution:
First, you need to know that we run full reindexations and index rolling every weekend.

1°/ would you create a child document per user in the product index or into another. If you mean into the product, it's the same solution we already have , and that's why we want to change (fullreindex is to long because the query is superrrrrr long)

2°/ Is it a problem to have a parent-child relationchip between indices that are dropped and reindexed ? (But they are knowned by a fix alias)

thx !

Well, that is new information. Why are you reindexing and rolling indices every weekend?

Parent-child requires it to be in the same index.

The parent and all associated children must reside in the shard.

Yes that's what I just read, and it's not possible :frowning:

I'll search another solution because setting index.query.bool.max_clause_count: 100000
is probably not a solution...

@Christian_Dahlqvist Do you have a recommendation about the max limit for this field ? It's Long ids in our case... And the biggeste array is about 118K :-/

No, I have never tried anything at that scale.

How come you are reindexing and rolling indices every weekend? Is there no way to enhance this process to allow a different solution to the problem?

Ok, thank you.

Reindexing are done every weekend (and partial in the night) because there are changes not synchronized from our MySQL database. For old crappy reasons :wink:

thx,
Xavier

FYI: https://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_a_TooManyClauses_exception.3F

Increase the number of terms using BooleanQuery.setMaxClauseCount(). Note that this will increase the memory requirements for searches that expand to many terms. To deactivate any limits, use BooleanQuery.setMaxClauseCount(Integer.MAX_VALUE).

I would discourage to increase the maximum number of clauses, it tends to be a source of problem as Lucene may need to read from {max_clause_count} locations on the disk in parallel.

In general, terms queries are only subject to the maximum clause count if their score is required. So I'm wondering that you could work around your issue by just putting your terms query in a filter context, such as under a constant_score query or in a bool filter clause (assuming that you don't need scores for that query)?

1 Like

You are right, changing a query terms into a bool.filter.terms does not throw a too_many_clauses exception anymore.

(Having an index with a doc and a field "concurrent_company_ids" > 1024 values)
This cause an exception:

  {
    "from": 0,
    "size": 10,
    "query": {
      "terms": {
	"external_company_id": {
	  "index": "index-user-territory",
	  "type": "user_territory",
	  "id": "85",
	  "path": "concurrent_company_ids"
	}
      }
    }
  }

This one doesn't:

  {
    "from": 0,
    "size": 10,
    "query": {
      "bool": {
	"filter": [
	  {
	    "terms": {
	      "external_company_id": {
		"index": "index-user-territory",
		"type": "user_territory",
		"id": "85",
		"path": "concurrent_company_ids"
	      }
	    }
	  }
	]
      }
    }
  }

Now, I have to try to change all queries to filtered or constant_score... it's going to be tough :frowning:

1 Like

Cool, thanks for bringing closure!

Note that it also works when the terms lookup is encapsulated into a constant_score query. :+1:

thx,
Xavier

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.