ES Recommended Configuration?

Hi Shay,
We are planning to load a huge dataset up in the end of this week. What would be your recommendation for loading 10 Million documents with 200 GB of Index size? We are not planning to have Replica and Gateway would be pointing to NAS.

Kindly let me know the details for

  1. Number of Nodes that has to be configured
  2. Number of Shards that has to be configured
  3. Maximum and Minimum RAM for running Elastic Search.

Let me know anything else to be considered too. FYI, Currently I am using ES 0.15.2 version.

Thanks for your understanding.

Thanks
SRR.

Those type of questions are a bit hard to answer, since its a question of what type of data are you indexing, what is acceptable indexing speed, and also, what type of searches you plan to do.

In general, I would recommend to use local gateway with replicas, and not using NAS with shared file system gateway. It usually provides better performance, and replicas play a major role when searching as well.

10 million docs and 200GB of data is not really huge :). So, I think you can start with 2-4 nodes. But it really depends on the spec of the machine. In general, you would want to allocate elasticsearch as much memory as possible, but still leave some for the file system cache. For example, on a 16gb memory machine, you can allocate to elasticsearch between 6-8gb, and leave the rest for the file system cache.

In order to best answer you question, I suggest running simple test runs with smaller sample documents and see how it goes.
On Wednesday, March 16, 2011 at 2:05 PM, srrIN wrote:

Hi Shay, We are planning to load a huge dataset up in the end of this week. What would be your recommendation for loading 10 Million documents with 200 GB of Index size? We are not planning to have Replica and Gateway would be pointing to NAS. Kindly let me know the details for 1. Number of Nodes that has to be configured 2. Number of Shards that has to be configured 3. Maximum and Minimum RAM for running Elastic Search. Let me know anything else to be considered too. FYI, Currently I am using ES 0.15.2 version. Thanks for your understanding. Thanks SRR.
View this message in context: ES Recommended Configuration?
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

Hi Shay,
Thank you for the reply.

Here is my detailed Specification

Server 1 , 2 & 3
Intel Xeon 5620 processor 2.40 Ghz
24 GB of RAM
450 GB of SAS HDD
Elastic Search Data Node to be configured with 6 Shards and 5 Replicas in all the three servers

Server 4:
Intel Xeon 5670 processor 2.93 Ghz
48 GB of RAM
450 GB of SAS HDD
Elastic Search No Data Node to be configured and a
Windows Service which would communicate with ES for indexing and for few functionalities like updating and faceting.

My ES data size after indexing would be 450 GB.
In the previous version of ES we have configured local gateway and we found data loss when ES crashes because of issues like Java Heap size (OOM), etc., Data loss is because we have not configured replica and gateway configured in local.

Finally we decided to point the gateway to NAS and configure replica for safety purpose.

  1. Your views on my plan?
  2. If I configured local gateway with replica, will there be any chance of data loss when ES crash on various criteria?
  3. We have indexed a sample dataset in 3 server Test env.(Same as above) which holds 150 GB index size with 3 Million documents , where ES configured like
    1. Server 1 - Data Node 1 and Data Node2 (Each Node with 10 GB ES)
    2. Server 2 - Data Node 3 and Data Node 4 (Each Node with 10 GB ES)
    3. Server 3 - Data Node 5 and No Data Node (Each Node with 10 GB ES)
    4. Gateway configured to NAS
      We found some slowness when issued parent child queries. Let me know your thoughts?

Thanks
SRR