I recently bumped into a small problem: I need to aggregate a field (i.e. find unique values) from the search results of a first query and use it as a filter in an ids (or terms) query on a second index.
As an example, let's say that I have an "events" index with millions and millions of events made by users. I can aggregate the userId from the results of some search on events and use it as a filter on the "users" index.
a termsQuery (suppose I have the id also in a keyword field) with 10k+ ids is unfeasible. After setting
indices.query.bool.max_clause_count: 10000 it hogs the cpu for ~10seconds to get a a result on my laptop
The idsQuery instead (I am using external ids in the users index) seems to be very fast. It's so fast it looks like it isn't even there in a quick test I did.
Is this the right way to proceed? Should I do this differently?
I cannot denormalize the user into each event because the impact on disk usage would be huge.
I am currently using ES 5.1, I'll upgrade to 5.2 soon. In my test I created an index with 10M documents and filtered for 10k ids.