Hi guys,
I have 30 terabytes of Elasticsearch data spread across 11 nodes. Each index is separated by month and has 5 primary shards with 5 replica shards. Some shards are big (over 200G) and some are small (less than 10G). However, I want to search quickly across all data (right now, it takes more than a minute to finish and the loads of all nodes would be maximized by a single 6-word search query). Another search on top of it would make the time 2-3 minutes.
I was thinking to have 12 nodes and each 2 nodes contain a node that has one-year worth of data for primary and another node for the replicas. I'm hoping that way, the data can be cached better for searches and there would be less communication necessary between nodes. By default, it search all data but it is also possible to search for just last year or last two years.
However, my nodes don't have very big memory (around 32G per node, half of it is for heap).
Do you guys think this is a good approach? If not, what else can I do to improve the search time?
Thanks in advance!