After lot of testing I am going to move to production.
Is it ok to have default 5 shard and large indices size 50gig with 150Million documents?
cluster I have setup is pretty big 8 node with three master and all 8 node acting as data node as well.
I might have 10 more of this type of large indices on this cluster with 50 more small indices.
I want to get better idea before I dump all this in to production. I know I don't have any disk space issue or memory issue as all my systems has good amount of memory and disk space.
I just want to make sure I won't get big performance issue with this type of setup.
All three master server is acting as logstash and kibana server as well.
How many indices/shards are you planning to create every day? How long do you intend to keep data in the cluster? What is the specification of your Elasticsearch nodes?
I would recommend having a look at the following resources:
This in my opinion sounds like a bad idea unless you have very powerful nodes. Even in that case I would put the additional processes on nodes that are not master eligible. Elasticsearch assumes all nodes are equal by default and here you are taking resources away from the Elasticsearch nodes you want to be the most stable.
all systems has 98 gig ram, three of them are on 10gig network, rest of them are on gig network.
some of the storage are 10k disk, some are 7.2k. most of them are dual socket four core cpus
No new indices will be created that often. just updated existing indices, each has five shard. going to keep only one year of data and delete once a month older data from indices using delete_by_query.
Three system in USA, three in UK and three in Singapore.
I just checked my biggest indice (19.2gig) largest shard for that indice is 4.4gig. I do not expect that to be more then double even if I keep two year of record. This size is just one year record 2018.
Elasticsearch requires low latencies between nodes, so deploying a cluster across data centres that far apart is not recommended nor supported. I would therefore recommend setting up a separate cluster per region.
The recommended shard size ~8-10GB so that relocation of shards of initialization of shards if they go unassigned is fast. So for a 50GB sized index, # of shards can be around ~6.
If you are going to keep data in the cluster for a long time, try to aim for an average shard size of between 30GB and 50GB. In your initial description it sounded like you were going to create a good number of smaller indices, which I would recommend against.
you can close this. Large Cluster is broken in to separate clusters, shard size is not more then 4.4 gig. and max size for Indice is 41gb. Even if I go two year record it will be 80gig and shard size will be 8 gig.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.