Does it make sense to separate rarely used documents for faster performance?

I'm designing a system which will have ~2M documents and 95% percent of the queries will be targeted against 40-50k documents.

Documents in the system have two states bool fields active and inactive.

I have ~50k in the state active and the rest 1.95 million are inactive.

So my questions is does it make sense to separate these documents? How can I separate them? Do I need to worry about this at all? Maybe I can use shards and a hash function which chooses buckets depending on the state active/inactive?

I can also introduce two indexes, one which holds active and one for the inactive, that way most of the queries will do a lookup in small database. This solution however will require more maintenance and can cause more headaches in the long run.

I'm not fully aware of the capabilities provided by ES and I would love to hear someone with more experience.

Two million documents is not that much, so I would go with a single index to start simple, and just have an active boolean for each document. Make sure you use filters, so that active/inactive documents can make use of the node query cache.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.