I currently have a single node with SSD running an elastic cluster.
4B records split over 25 primary shards in a single index (no replication).
The ID of every record has been manually set as it is unique and known beforehand by both the indexer and searcher since I was suggested that directly fetching with the ID would get the best performance. But all data must be searched every time, so no hot/cold/rolling index is available.
To make an example of this, the md5 checksum of a file (as ID) and the document contains tags and metadata about that document - just to paint a picture
Searching for 1,000 - 2,000 id's at a time takes around 4-8 seconds (respectively) which I believe to be fairly slow but am not sure what the exact bottleneck is. The primary thing I'd like to eliminate is that using mget with predefined ID's is the root cause of this and that searching with a keyword for the md5 checksum of a file would be faster (or any other method).
And if the issue is hardware, upgrading which component would yield the biggest performance increase and why? The CPU util is very low overall but disk read/write is often 500-600MB/s
R710 system with: 2x Intel(R) Xeon(R) CPU X5680 @ 3.33GHz
32GB DDR3 RAM with 24GB allocated to Elasticsearch
2x2TB SSD (RAID 0)