Algorithmic complexity of bool queries

rex-remind · December 9, 2020, 7:38am

I've been searching all over the net to get an understanding of the algorithmic complexity / rough big-O of bool queries for Elasticsearch, but I have come up dry so I'm asking here. (Note: this is not a question about correct query syntax, I will use psuedo-code that basically matches with the dsl for expediency.)

My use case - I have a query that has essentially a series of ors (shoulds) and ands (musts). To simplify I'll create a toy example:

query name='name' and filter: (x=['<uuid_1>', ...] and y=['<uuid_2>', ...]) or z='[<uuid_3>', ...])
order by name desc

where x, y, and z are keywords and name is text, name is in a sort.field with sort.order. name may be absent from some queries where we just filter.

In my use case x, y, and z all could contain one or more of half-a-billion values i.e. cardinality is 500,000,000. The number of documents is 500,000,000 as well.

We expect in the worst case for each = relation to match ~1 million documents.

So my question is, how can I reason about the performance? If this was just filtering on x and nothing more I'd expect it to be an O(1) fetch of 1 million uuids and it would just return the top 50 docs out of 1 million for 1 page which is pretty straight forward.

But how will a bool query like this operate? 3 x O(1) fetches and then join? How does the join happen? Does it need re-iterate over all the hits in memory and combine based on the bool logic? I'm really not sure.

I appreciate any insight here,
thanks!

Mark_Harwood · December 9, 2020, 8:46am

Adrien Grand has done a lot of great talks on Lucene internals. Here’s one: https://youtu.be/p51vIDWHWqk

His more recent work on Block max WAND is also of interest.

system · January 6, 2021, 8:46am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Bool and and/or filter Elasticsearch	4	389	July 6, 2017
Bool and And filter, which is faster? Elasticsearch	3	428	July 6, 2017
How does ES perform if the query does boolean operation on nested types Elasticsearch	3	822	January 4, 2019
Bug in bool queries? Elasticsearch	2	303	July 6, 2017
Elasticsearch query performance Elasticsearch	15	507	July 6, 2017

Algorithmic complexity of bool queries

Related topics