I've been searching all over the net to get an understanding of the algorithmic complexity / rough big-O of bool queries for Elasticsearch, but I have come up dry so I'm asking here. (Note: this is not a question about correct query syntax, I will use psuedo-code that basically matches with the dsl for expediency.)
My use case - I have a query that has essentially a series of ors (shoulds) and ands (musts). To simplify I'll create a toy example:
query name='name' and filter: (x=['<uuid_1>', ...] and y=['<uuid_2>', ...]) or z='[<uuid_3>', ...])
order by name desc
where x
, y
, and z
are keywords and name
is text, name
is in a sort.field
with sort.order
. name
may be absent from some queries where we just filter.
In my use case x, y, and z all could contain one or more of half-a-billion values i.e. cardinality is 500,000,000. The number of documents is 500,000,000 as well.
We expect in the worst case for each =
relation to match ~1 million documents.
So my question is, how can I reason about the performance? If this was just filtering on x and nothing more I'd expect it to be an O(1) fetch of 1 million uuids and it would just return the top 50 docs out of 1 million for 1 page which is pretty straight forward.
But how will a bool query like this operate? 3 x O(1) fetches and then join? How does the join happen? Does it need re-iterate over all the hits in memory and combine based on the bool logic? I'm really not sure.
I appreciate any insight here,
thanks!