Newly Setting up Elastic search 1.7 cluster

(ganeshbabu) #1


ES_DEV cluster have setup 2 JVMs on each of the 3 servers. 1 master and 1 data on each.
Server 1  master, data & client
Server 2  master & data
Server 3  master + data
Server 1 will have 3rd JVM will act as a client node.
I have configured elastic search server (ES 1.7) with JAVA 8 version. We have setup highly configured RAM (256 GB) and we have given ES_HEAP_SIZE values as 16 GB. Each server will have total 48 cores of CPU. We have created 4 index with shards 5 and replicas given as “0” during bulk indexing after bulk indexing replicas will explicitly change to “1” in yml config file.

Total documents: - 100 million docs
Primary Size: - 750 GB
My config file details:-
index.refresh_interval: -1
action.disable_delete_all_indices: true
indices.fielddata.cache.size: 75%
indices.breaker.fielddata.limit: 85%
bootstrap.mlockall: true
http.max_content_length: 500mb
I also enabled doc values for not_analysed field in mappings

My Questions:-

  1. Is this correct way of setting up cluster?
  2. What is the ideal value for number of shards?
  3. Should I change any config settings for better bulk index & searching query related performance?
  4. Will the ES_HEAP_SIZE=16 GB helps to resolve out of memory error?

Please suggest us any other config settings for better performance of ES..


(Magnus Bäck) #2
  1. What's "correct" depends on the context. It's a bit weird that you're setting index.refresh_interval to -1. Disabling the periodic refreshes via elasticsearch.yml is rather drastic (even though it can be overridden as a per-index setting).
  2. Having more than one shard per index per node is probably not useful for performance unless you plan on adding additional nodes in the future or if the shard size is or might become too big. You'll want to keep shards at most a few tens of gigabytes in size.
  3. Nothing I can think of.
  4. What heap size you need depends on many factors like the query load, the document size, field mappings and so on.

(ganeshbabu) #3

thanks for your response @magnusbaeck

We are setting index.refresh_interval to -1 during bulk loading only after that we change to 10s.

During bulk indexing refresh value take higher value, so it causing slow I/O.


(system) #4