Filtered aliases and huge ids/terms filter

telendt · August 24, 2016, 4:11pm

Hi,

Here's the problem I'm trying to solve - we have quite big search catalog (hundreds of millions of documents) which we need to split into two sets: a relatively small one (1-2%) and a big one (remaining 98-99%). Each set should be queried independently (it would be great if we could get X results from both groups a the same time with a single search query, but that's not a requirement). The thing is that our documents change their "set membership" quite often and we don't want that to trigger document reindexing (which is a very complex task for us for various reasons). It's also worth mentioning that we're fine with changing documents set membership at once in periodic bulk operation (this change does not need to be a "realtime").

So we thought about utilizing filtered aliases for that. We started with ids filter (with ids of the smaller set, ~1.5M ids, ~20MB in JSON format). Right after creating such alias we noticed increased response time from the cluster, although the alias was not really used by any query. We removed the alias and things went back to normal.

Next thing we tried was terms filter on "_id" field with "terms lookup mechanism". So we created our "lookup document" (again, ~20MB of source with ids of the documents from the smaller set), made sure that it's replicated among all nodes and used that in terms filter in a search query that normally takes ~100ms to complete (without that filter). After proper warmup we got response times around 1-3s (plain execution mode, cached), which we can't accept. We experimented with different execution modes with no luck.

It's probably worth mentioning that we still use ES 1.x (1.7.3 to be more specific).

Any suggestions how to solve that issue?

Best,
Tomasz

ywelsch · August 25, 2016, 9:19am

Can you quantify what you mean by

change their "set membership" quite often

Is it possible to calculate this membership based on the contents of the documents? Can you adapt the document contents so that this membership can be determined by ES instead of passing an explicit list of document ids?

telendt · August 25, 2016, 10:26am

Can you adapt the document contents so that this membership can be determined by ES instead of passing an explicit list of document ids?

But that would require document reindexing whenever it moves from one set to the other and that's something we would like to avoid.

ywelsch · August 25, 2016, 10:28am

no, what I meant is if you can enrich the original document structure so that this dynamic membership can be defined in terms of standard filters.

Topic		Replies	Views
Improving query performance with many filtered aliases Elasticsearch	5	1948	March 1, 2017
Is it ok to make an idsQuery with lots (10k+) of ids on ES 5.x? Elasticsearch	2	704	March 6, 2017
Terms lookup - broken by design Elasticsearch	3	710	September 9, 2017
Advices on migrating 1.3.2 to 1.4.1 Elasticsearch	5	386	July 6, 2017
Filter on thousands of IDS - How is it efficient? Elasticsearch	2	627	July 6, 2017

Filtered aliases and huge ids/terms filter

Related topics