We've got very big dataset to index. Our idea was to make a single index and to use routing based on user_id (something like 200 users growing every month).
The thing is that some user_id have many hundred millions of lines to be indexed. So we thought of creating one index per year, routing using user_id and create an alias containing all those different indexes.
For example an alias called "LOG" containing LOG2010, LOG2011, LOG2012,LOG2013 and so on...
But I think that when searching using this alias, all the indexes will be searched (one shard per index thanks to routing...)
It may not be the best solution. I thought that filters on alias would be a way to select which indexes should be queried to give the results, but I think it's not the aim of those alias filters.
On one query, if we do this, if we have ten years of data, it means that 10 indexes will be queried to get the results (which are mainly aggregations).
Is it a good strategy ?
How would you tackle this ?
We thought defining 20 shards per index... Is it too much or too small ?
Thanks for your help