I have an index that has roughly 60 million documents in it with more being added to it daily (about half a million every 2-3 months).
The field that will be searched in the index is a text field that can contain a large amount of text( anything from 3-4 pages of text up to 200 pages of text). As you can see it can contain a lot of terms to search for across shards. The search that is done to that index also uses highlighting to find the terms matched from the query string.
My question is that what would be the first thing i should do to speed up search speeds on this index and where to continue from there? Even if this means sacrificing index speeds it does't really affect my workflow. The main thing i need is the search speed to be increased.
At the moment it has only 5 primary shards and i do not really know how many shards should i reindex the index with. Maybe one shard per million documents?
I also have 3 data nodes that host the data and the index is replicated in each node(1 primary and 2 replicas). Would i see an increase in search speed if i add more data nodes and replicate the shard into them? Or is this level of redundancy enough and anything past it would point into diminishing returns.
The data nodes are currently running in 2 CPU and 8 RAM. I've heard that RAM will usually be the main bottleneck in elasticsearch search but it seems that highlight searching does tax the CPU more than the RAM( this is based on my observations, anyone that can prove this wrong is more than welcome and would actually help me )
I don't think the I/O is also an issue here as 1500 IOPS seem to be enough at the moment( again anyone that can dispute this is more than welcome )
I just need to know which path to take first as i don't have a lot of financial resources to work with but if i can gather enough evidence i might be able to change that.
If more information is needed i will gladly give more but i though these were more relevant to the situation.
Thank you before hand.