I'm designing a system which will have ~2M documents and 95% percent of the queries will be targeted against 40-50k documents.
Documents in the system have two states bool fields active
and inactive
.
I have ~50k in the state active
and the rest 1.95 million are inactive
.
So my questions is does it make sense to separate these documents? How can I separate them? Do I need to worry about this at all? Maybe I can use shards and a hash function which chooses buckets depending on the state active/inactive?
I can also introduce two indexes, one which holds active
and one for the inactive
, that way most of the queries will do a lookup in small database. This solution however will require more maintenance and can cause more headaches in the long run.
I'm not fully aware of the capabilities provided by ES and I would love to hear someone with more experience.