Index/Shard Design on ES cluster

bob_webman · September 10, 2014, 6:31am

Hi Guys,

I have 4 nodes in my ES cluster. They are big boxes with 24GB memory and
32TB hard disk. ES is configured with 12GB and I have done extensive
testing and I am happy with the actual implementation.

This ES cluster is connected to my Hadoop cluster with good 10GB
connections throughout. The hadoop cluster has 12 nodes and I use logstash
to move historical logs off the hadoop cluster to ES.

Given these assumptions:

I have lots of disk space per machine, I do not expect to run out of disk
space.
The user query load is very light. Used for adhoc research not production
I will have several years of data so was planning on one index per month.
eg; logstash-2014.90
I do not care too much about replication as all the data is on the hadoop
cluster. On failure I will re-index

Question:

How many shards should I aim for per index?

I was thinking of FOUR per index on the assumption that it will be ONE
shard per node.
When I load the data from hadoop I do it via a streaming map-reduce using
the logstash netcat route with 3 hadoop nodes pointing to 1 ES node.
For this reason 1 shard per node seems a good idea?

eg;
hadoop1 streaming Mapper ----> logstash on hadoop ---->netcat-----> ES node1
hadoop2 streaming Mapper ----> logstash on hadoop ---->netcat
hadoop3 streaming Mapper ----> logstash on hadoop ---->netcat
hadoop4 streaming Mapper ----> logstash on hadoop ---->netcat-----> ES node2
hadoop5 streaming Mapper ----> logstash on hadoop ---->netcat
hadoop6 streaming Mapper ----> logstash on hadoop ---->netcat
...and so on

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/33b83ebd-b601-409c-91f6-42444586bcc6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · September 10, 2014, 6:45am

Your assumption is correct, one node per shard. You can over allocate if
you expect or want to add more nodes in the future.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624bscRDxEdj-4Pbh_E983ZLMMjMpXj4irf%3D3T7zOjZw92w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Setting up elasticsearch to scale: shards per index Elasticsearch	9	480	July 6, 2017
No. of Shards Per index in ES Cluster Elasticsearch	4	1224	July 5, 2017
Just how big should an index be allowed to be? Elasticsearch	2	1577	July 6, 2017
Ideal number of shards for 2 node ES cluster Elasticsearch	4	447	March 26, 2020
Questions related to ES cluster architecture Elasticsearch	3	347	July 6, 2017

Index/Shard Design on ES cluster

Related topics