Those type of questions are a bit hard to answer, since its a question of what type of data are you indexing, what is acceptable indexing speed, and also, what type of searches you plan to do.
In general, I would recommend to use local gateway with replicas, and not using NAS with shared file system gateway. It usually provides better performance, and replicas play a major role when searching as well.
10 million docs and 200GB of data is not really huge :). So, I think you can start with 2-4 nodes. But it really depends on the spec of the machine. In general, you would want to allocate elasticsearch as much memory as possible, but still leave some for the file system cache. For example, on a 16gb memory machine, you can allocate to elasticsearch between 6-8gb, and leave the rest for the file system cache.
In order to best answer you question, I suggest running simple test runs with smaller sample documents and see how it goes.
On Wednesday, March 16, 2011 at 2:05 PM, srrIN wrote:
Hi Shay, We are planning to load a huge dataset up in the end of this week. What would be your recommendation for loading 10 Million documents with 200 GB of Index size? We are not planning to have Replica and Gateway would be pointing to NAS. Kindly let me know the details for 1. Number of Nodes that has to be configured 2. Number of Shards that has to be configured 3. Maximum and Minimum RAM for running Elastic Search. Let me know anything else to be considered too. FYI, Currently I am using ES 0.15.2 version. Thanks for your understanding. Thanks SRR.
View this message in context: ES Recommended Configuration?
Sent from the ElasticSearch Users mailing list archive at Nabble.com.