ES Huge Cluster Setup Advice high usage of RAM


(Ravi) #1

Hi Every one,

I'm trying to setup ES Cluster which can able to store lest 3 tb data form 560 Client servers every day of Ingesting in Data nodes

Here I have designed basic ES cluster

Filebeat t=> 3 Logstash Servers => 5 ES Data Nodes

Filebeat Config

####----------------------------- Logstash output --------------------------------
output.logstash:
####The Logstash hosts
hosts: ["10.146.134.15:5044","10.146.134.16:5044","10.146.134.17:5044"]
loadbalance: true
worker: 2
bulk_max_size: 2048
[root@ccdn-ats-tk-10405-01 ~]#

Logstash Configuration

 output {
   #stdout {codec => rubydebug}
        elasticsearch {
	hosts => ["10.146.134.24:8888","10.146.134.6:8888","10.146.134.7:8888","10.146.134.8:8888"]
	index => "logstash-%{client}-%{+YYYY.MM.dd}"
}

}
[root@elk-es-ho-14 ~]#

Elasticsearch configuration

[root@elk-es-ho-04 elasticsearch]# cat /etc/elasticsearch/elasticsearch.yml

cluster.name: QDN-Hillsboro-ELK
node.name: elk-es-ho-04
network.host: 10.146.134.24
http.port: 8888
discovery.zen.ping.unicast.hosts: ["10.146.134.12","10.146.134.13","10.146.134.14"]
node.master: false
node.data: true
discovery.zen.minimum_master_nodes: 2
bootstrap.memory_lock: true
path.data: /data/0
thread_pool.bulk.queue_size: 1000
thread_pool.search.queue_size: 10000

Indices settings

indices.memory.index_buffer_size: 30%

Problem ram.percentage goes 100% high with which gives slow dashboards
I have dedicated 62Gb of ram memory to Jvm as each server has 230Gb of inbuild ram

[root@elk-es-ho-10 ~]# curl -XGET 10.146.134.11:8888/_cat/nodes?v
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
10.146.134.7 53 85 36 8.23 7.49 8.33 di - elk-es-ho-06
10.146.134.10 41 100 9 1.93 1.49 1.64 di - elk-es-ho-09
10.146.134.8 73 93 46 8.04 8.67 9.23 di - elk-es-ho-07
10.146.134.6 65 97 52 6.84 5.84 6.38 di - elk-es-ho-05
10.146.134.13 2 37 0 0.05 0.03 0.05 mi * elk-es-ho-12
10.146.134.11 2 36 0 0.00 0.01 0.05 i - elk-es-ho-10
10.146.134.14 2 36 0 0.00 0.01 0.05 mi - elk-es-ho-13
10.146.134.12 3 38 2 0.21 0.39 0.50 mi - elk-es-ho-11
10.146.134.24 69 94 40 6.01 7.46 8.04 di - elk-es-ho-04


(Christian Dahlqvist) #2

Does this mean that you are generating 560 indices per day? Is this done with the default 5 primary shards and 1 replica? What is your retention period?


(Ravi) #3

Reason i have added index => "logstash-%{client}

I have added tage to this client server as Vector

 mutate { add_field =>
          { client => "vector" }
        }

But now I assume I'm doing 560 Indices every day. I haven't added 560 servers yet just 56 i added

changing index to

index => "logstash-%{type}-%{client}-%{+YYYY.MM.dd}"

Please advice


(Christian Dahlqvist) #4

Having large number of small indices and shards is very inefficient as each shard comes with overhead. Try to organise your indices and sharing strategy so you get an average shard size between a few GB and a few tens of GB.


(Ravi) #5

Just asking I have 230Gb of RAM memory how much jvm i can allocate to ES for each node

currently i have given 62 GB


(Christian Dahlqvist) #6

The recommendation is to have a heap size of around 31GB so you can benefit from compressed pointers. In order to utilise the resources available on larger servers, it is however not uncommon to instead run multiple Elasticsearch nodes.


(Ravi) #7

thanks very much for you advice


(Ravi) #8

just to make sure i have changed few settings in elasticsearch yml to increase

[root@elk-es-ho-10 ~]# cat /etc/elasticsearch/elasticsearch.yml

Data 6 : 10.146.134.11

cluster.name: XXXX
node.name: elk-es-ho-
network.host: 10.146.134.11
http.port: 8888
discovery.zen.ping.unicast.hosts: ["10.146.134.12","10.146.134.13","10.146.134.14"]
node.master: false
node.data: false
discovery.zen.minimum_master_nodes: 2
bootstrap.memory_lock: true
thread_pool.bulk.queue_size: 10000
thread_pool.search.queue_size: 10000

Indices settings

indices.memory.index_buffer_size: 30%


(Ravi) #9

still clusters runs with high usage of ram

[root@elk-es-ho-10 ~]# curl -XGET 10.146.134.11:8888/_cat/nodes?v
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
10.146.134.6 63 95 90 37.59 21.73 11.86 di - elk-es-ho-
10.146.134.13 2 37 0 0.00 0.02 0.05 mi - elk-es-ho-12
10.146.134.12 3 38 2 0.64 0.55 0.48 mi * elk-es-ho-11
10.146.134.11 2 36 0 0.00 0.01 0.05 i - elk-es-ho-10
10.146.134.10 21 100 76 22.15 23.11 20.86 di - elk-es-ho
10.146.134.8 52 99 81 18.57 16.92 10.89 di - elk-es-ho
10.146.134.7 54 100 29 4.81 5.92 6.63 di - elk-es-ho-
10.146.134.24 49 86 22 3.58 4.44 4.43 di - elk-es-ho
10.146.134.14 2 36 0 0.00 0.01 0.05 mi - elk-es-ho-
[root@elk-es-ho-10 ~]#


(Christian Dahlqvist) #10

How many indices and shards do you have in the cluster? How much data? How much data are you indexing per second? What bulk size are you using?


(Ravi) #11

Sorry for late replay was on Holiday
here is the report form curl -XGET 10.146.134.12:8888/_cat/shards?v

logstash-custom_ats_2-vector-2017.07.05 8 r STARTED 43189469 63.5gb 10.146.134.24
logstash-custom_ats_2-vector-2017.07.05 8 p STARTED 43189470 63.3gb 10.146.134.6
logstash-custom_ats_2-vector-2017.07.05 3 r STARTED 43195576 63.4gb 10.146.134.10
logstash-custom_ats_2-vector-2017.07.05 3 p STARTED 43195576 63.3gb 10.146.134.6
logstash-custom_ats_2-vector-2017.07.05 11 p STARTED 43178812 63.2gb 10.146.134.8
logstash-custom_ats_2-vector-2017.07.05 11 r STARTED 43178812 63.2gb 10.146.134.10
logstash-custom_ats_2-vector-2017.07.05 4 p STARTED 43190597 63.4gb 10.146.134.7
logstash-custom_ats_2-vector-2017.07.05 4 r STARTED 43190597 63.2gb 10.146.134.6
logstash-custom_ats_2-vector-2017.07.05 1 p STARTED 43182353 63.1gb 10.146.134.8
logstash-custom_ats_2-vector-2017.07.05 1 r STARTED 43182353 63.4gb 10.146.134.6
logstash-custom_ats_2-vector-2017.07.05 7 r STARTED 43184131 63.4gb 10.146.134.7
logstash-custom_ats_2-vector-2017.07.05 7 p STARTED 43184131 63.2gb 10.146.134.10
logstash-custom_ats_2-vector-2017.07.05 10 r STARTED 43190307 63.5gb 10.146.134.7
logstash-custom_ats_2-vector-2017.07.05 10 p STARTED 43190307 63.2gb 10.146.134.24
logstash-custom_ats_2-vector-2017.07.05 6 r STARTED 43185770 63.5gb 10.146.134.24
logstash-custom_ats_2-vector-2017.07.05 6 p STARTED 43185770 63.1gb 10.146.134.8
logstash-custom_ats_2-vector-2017.07.05 2 r STARTED 43192902 63.5gb 10.146.134.7
logstash-custom_ats_2-vector-2017.07.05 2 p STARTED 43192902 63.2gb 10.146.134.10
logstash-custom_ats_2-vector-2017.07.05 5 p STARTED 43187653 63.3gb 10.146.134.24
logstash-custom_ats_2-vector-2017.07.05 5 r STARTED 43187653 63.5gb 10.146.134.8
logstash-custom_ats_2-vector-2017.07.05 9 p STARTED 43196568 63.3gb 10.146.134.7
logstash-custom_ats_2-vector-2017.07.05 9 r STARTED 43196568 63.4gb 10.146.134.8
logstash-custom_ats_2-vector-2017.07.05 0 p STARTED 43189187 63.3gb 10.146.134.24
logstash-custom_ats_2-vector-2017.07.05 0 r STARTED 43189189 63.5gb 10.146.134.10
[root@elk-es-ho-11 ~]#


(Christian Dahlqvist) #12

That seems to just be a single index. What is the total number of shards and data volume for the whole cluster, per day and in total?


(Ravi) #13

yes I have 11 sources logs from client servers which are created as different index where i mentioned

in my logstash configuration my output looks like this

 output {
   #stdout {codec => rubydebug}
        elasticsearch {
	hosts => ["10.146.134.24:8888","10.146.134.6:8888","10.146.134.7:8888","10.146.134.8:8888"]
	index => "logstash-%{type}-%{client}-%{+YYYY.MM.dd}"
}

}

current shards are 12 shards and each shard arround 60Gb and and storing all events in one Index is arround 2.1 Tb of logs every day

logstash-puppet-vector-2017.07.04 9 r STARTED 695 359.7kb 10.146.134.7 elk-es-ho-06
logstash-puppet-vector-2017.07.04 9 p STARTED 695 401.3kb 10.146.134.6 elk-es-ho-05
logstash-puppet-vector-2017.07.04 1 p STARTED 624 298.3kb 10.146.134.7 elk-es-ho-06
logstash-puppet-vector-2017.07.04 1 r STARTED 624 351.8kb 10.146.134.8 elk-es-ho-07
logstash-puppet-vector-2017.07.04 11 p STARTED 717 408.5kb 10.146.134.7 elk-es-ho-06
logstash-puppet-vector-2017.07.04 11 r STARTED 717 408.8kb 10.146.134.8 elk-es-ho-07
logstash-puppet-vector-2017.07.04 6 p STARTED 675 331.7kb 10.146.134.7 elk-es-ho-06
logstash-puppet-vector-2017.07.04 6 r STARTED 675 401.3kb 10.146.134.6 elk-es-ho-05
logstash-puppet-vector-2017.07.04 10 r STARTED 657 382.5kb 10.146.134.24 elk-es-ho-04
logstash-puppet-vector-2017.07.04 10 p STARTED 657 368.6kb 10.146.134.10 elk-es-ho-09
logstash-puppet-vector-2017.07.04 4 r STARTED 704 322.7kb 10.146.134.24 elk-es-ho-04
logstash-puppet-vector-2017.07.04 4 p STARTED 704 403.4kb 10.146.134.6 elk-es-ho-05
logstash-puppet-vector-2017.07.04 8 p STARTED 669 362.5kb 10.146.134.8 elk-es-ho-07
logstash-puppet-vector-2017.07.04 8 r STARTED 669 350.8kb 10.146.134.10 elk-es-ho-09
logstash-puppet-vector-2017.07.04 7 r STARTED 698 309.9kb 10.146.134.7 elk-es-ho-06
logstash-puppet-vector-2017.07.04 7 p STARTED 698 350.6kb 10.146.134.24 elk-es-ho-04
logstash-puppet-vector-2017.07.04 0 r STARTED 686 354.3kb 10.146.134.24 elk-es-ho-04
logstash-puppet-vector-2017.07.04 0 p STARTED 686 382.4kb 10.146.134.10 elk-es-ho-09
logstash-teakd-vector-2017.06.29 9 r STARTED 4293057 4.2gb 10.146.134.24 elk-es-ho-04
logstash-teakd-vector-2017.06.29 9 p STARTED 4293057 4.2gb 10.146.134.8 elk-es-ho-07
logstash-teakd-vector-2017.06.29 10 r STARTED 4296013 4.2gb 10.146.134.24 elk-es-ho-04
logstash-teakd-vector-2017.06.29 10 p STARTED 4296013 4.2gb 10.146.134.10 elk-es-ho-09
logstash-teakd-vector-2017.06.29 11 r STARTED 4294355 4.2gb 10.146.134.10 elk-es-ho-09
logstash-teakd-vector-2017.06.29 11 p STARTED 4294355 4.2gb 10.146.134.6 elk-es-ho-05
logstash-teakd-vector-2017.06.29 5 r STARTED 4294367 4.1gb 10.146.134.7 elk-es-ho-06
logstash-teakd-vector-2017.06.29 5 p STARTED 4294367 4.2gb 10.146.134.10 elk-es-ho-09
logstash-teakd-vector-2017.06.29 1 r STARTED 4293620 4.1gb 10.146.134.7 elk-es-ho-06
logstash-teakd-vector-2017.06.29 1 p STARTED 4293620 4.1gb 10.146.134.6 elk-es-ho-05
logstash-teakd-vector-2017.06.29 2 p STARTED 4294387 4.1gb 10.146.134.7 elk-es-ho-06
logstash-teakd-vector-2017.06.29 2 r STARTED 4294387 4.2gb 10.146.134.8 elk-es-ho-07
logstash-teakd-vector-2017.06.29 7 p STARTED 4292952 4.2gb 10.146.134.7 elk-es-ho-06
logstash-teakd-vector-2017.06.29 7 r STARTED 4292952 4.2gb 10.146.134.6 elk-es-ho-05
logstash-teakd-vector-2017.06.29 8 p STARTED 4294534 4.2gb 10.146.134.24 elk-es-ho-04
logstash-teakd-vector-2017.06.29 8 r STARTED 4294534 4.1gb 10.146.134.8 elk-es-ho-07
logstash-teakd-vector-2017.06.29 4 p STARTED 4294611 4.1gb 10.146.134.8 elk-es-ho-07
logstash-teakd-vector-2017.06.29 4 r STARTED 4294611 4.2gb 10.146.134.6 elk-es-ho-05
logstash-teakd-vector-2017.06.29 3 p STARTED 4292768 4.2gb 10.146.134.24 elk-es-ho-04
logstash-teakd-vector-2017.06.29 3 r STARTED 4292768 4.1gb 10.146.134.8 elk-es-ho-07
logstash-teakd-vector-2017.06.29 6 r STARTED 4295711 4.2gb 10.146.134.10 elk-es-ho-09
logstash-teakd-vector-2017.06.29 6 p STARTED 4295711 4.2gb 10.146.134.6 elk-es-ho-05
logstash-teakd-vector-2017.06.29 0 r STARTED 4291871 4.2gb 10.146.134.24 elk-es-ho-04
logstash-teakd-vector-2017.06.29 0 p STARTED 4291871 4.2gb 10.146.134.10 elk-es-ho-09
logstash-error-vector-2017.06.26 11 p STARTED 771885 181.9mb 10.146.134.10 elk-es-ho-09

logstash-custom_ats_2-vector-2017.07.05 3 r STARTED 43205422 63.4gb 10.146.134.10 elk-es-ho-09
logstash-custom_ats_2-vector-2017.07.05 3 p STARTED 43205422 63.4gb 10.146.134.6 elk-es-ho-05
logstash-custom_ats_2-vector-2017.07.05 11 p STARTED 43188652 63.2gb 10.146.134.8 elk-es-ho-07
logstash-custom_ats_2-vector-2017.07.05 11 r STARTED 43188653 63.2gb 10.146.134.10 elk-es-ho-09
logstash-custom_ats_2-vector-2017.07.05 4 p STARTED 43200448 63.4gb 10.146.134.7 elk-es-ho-06


(Christian Dahlqvist) #14

Can you provide the full output of _cat/indices, e.g. via. a gist?


(Ravi) #16

xtbd


(Christian Dahlqvist) #17

One of your indices is very much larger than all the others and could probably benefit from a larger number of primary shards. Most of your indices do however have relatively little data and are therefore oversharded. For most of them one or two primary shards would be sufficient and most likely more efficient.


(Ravi) #18

(Ravi) #19

Great will give a try now thanks again


(system) #20

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.