Best practice with defining multiindex


(Grigory Rubstein) #1

I store data for one year ,
the data is time and account based, while every possible query always referred to one account only.

what will better for query performance , separate all data to time based (month) index , or create multi index based on account number (for example modulo 100 on account number)

the data is
400M docs per month
about 100,000 accounts.
not all account equals by number of documents.
query can be done only on one account but on every possible date of the year.

Thank you.


(Christian Dahlqvist) #2

Having time-based indices allow you to manage retention period efficiently, and is the recommended way of handling retention whenever possible. You can however combine this with routing based on the account ID at indexing and query time. This ensures that all records belonging to a single account are sent to the same shard (you do not control which one) and this allows queries to hit just one shard per month instead of all. Whether adding this extra complexity is beneficial or not will depend on your data volumes. I would recommend starting without routing and only enable it if needed.


(system) #3