This depends a lot on the use case but also what type of storage you have and what load you expect the cluster to be under. To get an answer I therefore think you need to provide more details or run some tests.
Map index, around 5 mil docs, concurrent users expected around 30 users, non-heavy search.
Huge index, perhaps more than 10mil of docs, will be used for data processing hence search request will keep on hitting the nodes, encounter search queue exceed kind of error before.
That is not nearly enough information. The size of the data set, type of data and queries have an impact, as does the query and indexing load and latency requirements.
Ideally you want a well-balanced node. There is little point in having lots of CPU if you have slow storage and this limits how fast you can retrieve data to process. If you on the other hand have a small data set that can be cached, you may get limited by CPU even if you have slow storage as disk I/O will be infrequent.
I would recommend running a test to see how much CPU your use case uses or is able to use and make sure that you have at least that amount ton ensure CPU is not a bottleneck.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.