ES Huge Cluster Setup Advice high usage of RAM

Ravikumar_G · June 26, 2017, 4:19pm

Hi Every one,

I'm trying to setup ES Cluster which can able to store lest 3 tb data form 560 Client servers every day of Ingesting in Data nodes

Here I have designed basic ES cluster

Filebeat t=> 3 Logstash Servers => 5 ES Data Nodes

Filebeat Config

####----------------------------- Logstash output --------------------------------
output.logstash:
####The Logstash hosts
hosts: ["10.146.134.15:5044","10.146.134.16:5044","10.146.134.17:5044"]
loadbalance: true
worker: 2
bulk_max_size: 2048
[root@ccdn-ats-tk-10405-01 ~]#

Logstash Configuration

 output {
   #stdout {codec => rubydebug}
        elasticsearch {
	hosts => ["10.146.134.24:8888","10.146.134.6:8888","10.146.134.7:8888","10.146.134.8:8888"]
	index => "logstash-%{client}-%{+YYYY.MM.dd}"
}

}
[root@elk-es-ho-14 ~]#

Elasticsearch configuration

[root@elk-es-ho-04 elasticsearch]# cat /etc/elasticsearch/elasticsearch.yml

cluster.name: QDN-Hillsboro-ELK
node.name: elk-es-ho-04
network.host: 10.146.134.24
http.port: 8888
discovery.zen.ping.unicast.hosts: ["10.146.134.12","10.146.134.13","10.146.134.14"]
node.master: false
node.data: true
discovery.zen.minimum_master_nodes: 2
bootstrap.memory_lock: true
path.data: /data/0
thread_pool.bulk.queue_size: 1000
thread_pool.search.queue_size: 10000

Indices settings

indices.memory.index_buffer_size: 30%

Problem ram.percentage goes 100% high with which gives slow dashboards
I have dedicated 62Gb of ram memory to Jvm as each server has 230Gb of inbuild ram

[root@elk-es-ho-10 ~]# curl -XGET 10.146.134.11:8888/_cat/nodes?v
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
10.146.134.7 53 85 36 8.23 7.49 8.33 di - elk-es-ho-06
10.146.134.10 41 100 9 1.93 1.49 1.64 di - elk-es-ho-09
10.146.134.8 73 93 46 8.04 8.67 9.23 di - elk-es-ho-07
10.146.134.6 65 97 52 6.84 5.84 6.38 di - elk-es-ho-05
10.146.134.13 2 37 0 0.05 0.03 0.05 mi * elk-es-ho-12
10.146.134.11 2 36 0 0.00 0.01 0.05 i - elk-es-ho-10
10.146.134.14 2 36 0 0.00 0.01 0.05 mi - elk-es-ho-13
10.146.134.12 3 38 2 0.21 0.39 0.50 mi - elk-es-ho-11
10.146.134.24 69 94 40 6.01 7.46 8.04 di - elk-es-ho-04

Christian_Dahlqvist · June 26, 2017, 4:28pm

Does this mean that you are generating 560 indices per day? Is this done with the default 5 primary shards and 1 replica? What is your retention period?

Ravikumar_G · June 26, 2017, 5:39pm

Reason i have added index => "logstash-%{client}

I have added tage to this client server as Vector

 mutate { add_field =>
          { client => "vector" }
        }

But now I assume I'm doing 560 Indices every day. I haven't added 560 servers yet just 56 i added

changing index to

index => "logstash-%{type}-%{client}-%{+YYYY.MM.dd}"

Please advice

Christian_Dahlqvist · June 26, 2017, 5:55pm

Having large number of small indices and shards is very inefficient as each shard comes with overhead. Try to organise your indices and sharing strategy so you get an average shard size between a few GB and a few tens of GB.

Ravikumar_G · June 26, 2017, 6:49pm

Just asking I have 230Gb of RAM memory how much jvm i can allocate to ES for each node

currently i have given 62 GB

Christian_Dahlqvist · June 26, 2017, 6:51pm

The recommendation is to have a heap size of around 31GB so you can benefit from compressed pointers. In order to utilise the resources available on larger servers, it is however not uncommon to instead run multiple Elasticsearch nodes.

Ravikumar_G · June 26, 2017, 10:29pm

thanks very much for you advice

Ravikumar_G · June 26, 2017, 10:30pm

just to make sure i have changed few settings in elasticsearch yml to increase

[root@elk-es-ho-10 ~]# cat /etc/elasticsearch/elasticsearch.yml

Data 6 : 10.146.134.11

cluster.name: XXXX
node.name: elk-es-ho-
network.host: 10.146.134.11
http.port: 8888
discovery.zen.ping.unicast.hosts: ["10.146.134.12","10.146.134.13","10.146.134.14"]
node.master: false
node.data: false
discovery.zen.minimum_master_nodes: 2
bootstrap.memory_lock: true
thread_pool.bulk.queue_size: 10000
thread_pool.search.queue_size: 10000

Indices settings

indices.memory.index_buffer_size: 30%

Ravikumar_G · June 26, 2017, 10:32pm

still clusters runs with high usage of ram

[root@elk-es-ho-10 ~]# curl -XGET 10.146.134.11:8888/_cat/nodes?v
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
10.146.134.6 63 95 90 37.59 21.73 11.86 di - elk-es-ho-
10.146.134.13 2 37 0 0.00 0.02 0.05 mi - elk-es-ho-12
10.146.134.12 3 38 2 0.64 0.55 0.48 mi * elk-es-ho-11
10.146.134.11 2 36 0 0.00 0.01 0.05 i - elk-es-ho-10
10.146.134.10 21 100 76 22.15 23.11 20.86 di - elk-es-ho
10.146.134.8 52 99 81 18.57 16.92 10.89 di - elk-es-ho
10.146.134.7 54 100 29 4.81 5.92 6.63 di - elk-es-ho-
10.146.134.24 49 86 22 3.58 4.44 4.43 di - elk-es-ho
10.146.134.14 2 36 0 0.00 0.01 0.05 mi - elk-es-ho-
[root@elk-es-ho-10 ~]#

Christian_Dahlqvist · June 27, 2017, 4:09am

How many indices and shards do you have in the cluster? How much data? How much data are you indexing per second? What bulk size are you using?

Ravikumar_G · July 7, 2017, 1:17pm

Sorry for late replay was on Holiday
here is the report form curl -XGET 10.146.134.12:8888/_cat/shards?v

logstash-custom_ats_2-vector-2017.07.05 8 r logstash-custom_ats_2-vector-2017.07.05 8 p logstash-custom_ats_2-vector-2017.07.05 3 r logstash-custom_ats_2-vector-2017.07.05 3 p logstash-custom_ats_2-vector-2017.07.05 11 p logstash-custom_ats_2-vector-2017.07.05 11 r logstash-custom_ats_2-vector-2017.07.05 4 p logstash-custom_ats_2-vector-2017.07.05 4 r logstash-custom_ats_2-vector-2017.07.05 1 p logstash-custom_ats_2-vector-2017.07.05 1 r logstash-custom_ats_2-vector-2017.07.05 7 r logstash-custom_ats_2-vector-2017.07.05 7 p logstash-custom_ats_2-vector-2017.07.05 10 r logstash-custom_ats_2-vector-2017.07.05 10 p logstash-custom_ats_2-vector-2017.07.05 6 r logstash-custom_ats_2-vector-2017.07.05 6 p logstash-custom_ats_2-vector-2017.07.05 2 r logstash-custom_ats_2-vector-2017.07.05 2 p logstash-custom_ats_2-vector-2017.07.05 5 p logstash-custom_ats_2-vector-2017.07.05 5 r logstash-custom_ats_2-vector-2017.07.05 9 p logstash-custom_ats_2-vector-2017.07.05 9 r logstash-custom_ats_2-vector-2017.07.05 0 p logstash-custom_ats_2-vector-2017.07.05 0 r [root@elk-es-ho-11 ~]# STARTED 43189469 63.5gb 10.146.134.24
STARTED 43189470 63.3gb 10.146.134.6
STARTED 43195576 63.4gb 10.146.134.10
STARTED 43195576 63.3gb 10.146.134.6
STARTED 43178812 63.2gb 10.146.134.8
STARTED 43178812 63.2gb 10.146.134.10
STARTED 43190597 63.4gb 10.146.134.7
STARTED 43190597 63.2gb 10.146.134.6
STARTED 43182353 63.1gb 10.146.134.8
STARTED 43182353 63.4gb 10.146.134.6
STARTED 43184131 63.4gb 10.146.134.7
STARTED 43184131 63.2gb 10.146.134.10
STARTED 43190307 63.5gb 10.146.134.7
STARTED 43190307 63.2gb 10.146.134.24
STARTED 43185770 63.5gb 10.146.134.24
STARTED 43185770 63.1gb 10.146.134.8
STARTED 43192902 63.5gb 10.146.134.7
STARTED 43192902 63.2gb 10.146.134.10
STARTED 43187653 63.3gb 10.146.134.24
STARTED 43187653 63.5gb 10.146.134.8
STARTED 43196568 63.3gb 10.146.134.7
STARTED 43196568 63.4gb 10.146.134.8
STARTED 43189187 63.3gb 10.146.134.24
STARTED 43189189 63.5gb 10.146.134.10

Christian_Dahlqvist · July 7, 2017, 2:00pm

That seems to just be a single index. What is the total number of shards and data volume for the whole cluster, per day and in total?

Ravikumar_G · July 7, 2017, 2:25pm

yes I have 11 sources logs from client servers which are created as different index where i mentioned

in my logstash configuration my output looks like this

 output {
   #stdout {codec => rubydebug}
        elasticsearch {
	hosts => ["10.146.134.24:8888","10.146.134.6:8888","10.146.134.7:8888","10.146.134.8:8888"]
	index => "logstash-%{type}-%{client}-%{+YYYY.MM.dd}"
}

}

current shards are 12 shards and each shard arround 60Gb and and storing all events in one Index is arround 2.1 Tb of logs every day

logstash-puppet-vector-2017.07.04 9 r STARTED 695 359.7kb 10.146.134.7 elk-es-ho-06
logstash-puppet-vector-2017.07.04 9 p STARTED 695 401.3kb 10.146.134.6 elk-es-ho-05
logstash-puppet-vector-2017.07.04 1 p STARTED 624 298.3kb 10.146.134.7 elk-es-ho-06
logstash-puppet-vector-2017.07.04 1 r STARTED 624 351.8kb 10.146.134.8 elk-es-ho-07
logstash-puppet-vector-2017.07.04 11 p STARTED 717 408.5kb 10.146.134.7 elk-es-ho-06
logstash-puppet-vector-2017.07.04 11 r STARTED 717 408.8kb 10.146.134.8 elk-es-ho-07
logstash-puppet-vector-2017.07.04 6 p STARTED 675 331.7kb 10.146.134.7 elk-es-ho-06
logstash-puppet-vector-2017.07.04 6 r STARTED 675 401.3kb 10.146.134.6 elk-es-ho-05
logstash-puppet-vector-2017.07.04 10 r STARTED 657 382.5kb 10.146.134.24 elk-es-ho-04
logstash-puppet-vector-2017.07.04 10 p STARTED 657 368.6kb 10.146.134.10 elk-es-ho-09
logstash-puppet-vector-2017.07.04 4 r STARTED 704 322.7kb 10.146.134.24 elk-es-ho-04
logstash-puppet-vector-2017.07.04 4 p STARTED 704 403.4kb 10.146.134.6 elk-es-ho-05
logstash-puppet-vector-2017.07.04 8 p STARTED 669 362.5kb 10.146.134.8 elk-es-ho-07
logstash-puppet-vector-2017.07.04 8 r STARTED 669 350.8kb 10.146.134.10 elk-es-ho-09
logstash-puppet-vector-2017.07.04 7 r STARTED 698 309.9kb 10.146.134.7 elk-es-ho-06
logstash-puppet-vector-2017.07.04 7 p STARTED 698 350.6kb 10.146.134.24 elk-es-ho-04
logstash-puppet-vector-2017.07.04 0 r STARTED 686 354.3kb 10.146.134.24 elk-es-ho-04
logstash-puppet-vector-2017.07.04 0 p STARTED 686 382.4kb 10.146.134.10 elk-es-ho-09
logstash-teakd-vector-2017.06.29 9 r STARTED 4293057 4.2gb 10.146.134.24 elk-es-ho-04
logstash-teakd-vector-2017.06.29 9 p STARTED 4293057 4.2gb 10.146.134.8 elk-es-ho-07
logstash-teakd-vector-2017.06.29 10 r STARTED 4296013 4.2gb 10.146.134.24 elk-es-ho-04
logstash-teakd-vector-2017.06.29 10 p STARTED 4296013 4.2gb 10.146.134.10 elk-es-ho-09
logstash-teakd-vector-2017.06.29 11 r STARTED 4294355 4.2gb 10.146.134.10 elk-es-ho-09
logstash-teakd-vector-2017.06.29 11 p STARTED 4294355 4.2gb 10.146.134.6 elk-es-ho-05
logstash-teakd-vector-2017.06.29 5 r STARTED 4294367 4.1gb 10.146.134.7 elk-es-ho-06
logstash-teakd-vector-2017.06.29 5 p STARTED 4294367 4.2gb 10.146.134.10 elk-es-ho-09
logstash-teakd-vector-2017.06.29 1 r STARTED 4293620 4.1gb 10.146.134.7 elk-es-ho-06
logstash-teakd-vector-2017.06.29 1 p STARTED 4293620 4.1gb 10.146.134.6 elk-es-ho-05
logstash-teakd-vector-2017.06.29 2 p STARTED 4294387 4.1gb 10.146.134.7 elk-es-ho-06
logstash-teakd-vector-2017.06.29 2 r STARTED 4294387 4.2gb 10.146.134.8 elk-es-ho-07
logstash-teakd-vector-2017.06.29 7 p STARTED 4292952 4.2gb 10.146.134.7 elk-es-ho-06
logstash-teakd-vector-2017.06.29 7 r STARTED 4292952 4.2gb 10.146.134.6 elk-es-ho-05
logstash-teakd-vector-2017.06.29 8 p STARTED 4294534 4.2gb 10.146.134.24 elk-es-ho-04
logstash-teakd-vector-2017.06.29 8 r STARTED 4294534 4.1gb 10.146.134.8 elk-es-ho-07
logstash-teakd-vector-2017.06.29 4 p STARTED 4294611 4.1gb 10.146.134.8 elk-es-ho-07
logstash-teakd-vector-2017.06.29 4 r STARTED 4294611 4.2gb 10.146.134.6 elk-es-ho-05
logstash-teakd-vector-2017.06.29 3 p STARTED 4292768 4.2gb 10.146.134.24 elk-es-ho-04
logstash-teakd-vector-2017.06.29 3 r STARTED 4292768 4.1gb 10.146.134.8 elk-es-ho-07
logstash-teakd-vector-2017.06.29 6 r STARTED 4295711 4.2gb 10.146.134.10 elk-es-ho-09
logstash-teakd-vector-2017.06.29 6 p STARTED 4295711 4.2gb 10.146.134.6 elk-es-ho-05
logstash-teakd-vector-2017.06.29 0 r STARTED 4291871 4.2gb 10.146.134.24 elk-es-ho-04
logstash-teakd-vector-2017.06.29 0 p STARTED 4291871 4.2gb 10.146.134.10 elk-es-ho-09
logstash-error-vector-2017.06.26 11 p STARTED 771885 181.9mb 10.146.134.10 elk-es-ho-09

logstash-custom_ats_2-vector-2017.07.05 3 r STARTED 43205422 63.4gb 10.146.134.10 elk-es-ho-09
logstash-custom_ats_2-vector-2017.07.05 3 p STARTED 43205422 63.4gb 10.146.134.6 elk-es-ho-05
logstash-custom_ats_2-vector-2017.07.05 11 p STARTED 43188652 63.2gb 10.146.134.8 elk-es-ho-07
logstash-custom_ats_2-vector-2017.07.05 11 r STARTED 43188653 63.2gb 10.146.134.10 elk-es-ho-09
logstash-custom_ats_2-vector-2017.07.05 4 p STARTED 43200448 63.4gb 10.146.134.7 elk-es-ho-06

Christian_Dahlqvist · July 7, 2017, 2:31pm

Can you provide the full output of _cat/indices, e.g. via. a gist?

Ravikumar_G · July 7, 2017, 2:40pm

xtbd

Christian_Dahlqvist · July 7, 2017, 2:47pm

One of your indices is very much larger than all the others and could probably benefit from a larger number of primary shards. Most of your indices do however have relatively little data and are therefore oversharded. For most of them one or two primary shards would be sufficient and most likely more efficient.

Ravikumar_G · July 7, 2017, 2:48pm

gist.github.com

https://gist.github.com/inf2ravikumar/172e60c4693a6748a55da847d0d61e7e

indices?v

[root@elk-es-ho-11 network-scripts]# curl -XGET 10.146.134.12:8888/_cat/indices?v
health status index                                   uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   logstash-custom_ats_2-vector-2017.06.23 aahC5d3FQkmn8kTMJkP8ag  12   1    5228354            0     15.3gb          7.6gb
green  open   logstash-error-vector-2017.06.30        HUqDNYxkTBqlxdpVqK2qpg  12   1   97937269            0     43.8gb         21.8gb
green  open   logstash-drive-vector-2017.06.27        8SCykFNsSgGRkfktjtRxaA  12   1      12065            0     11.2mb          5.5mb
green  open   logstash-secure-vector-2017.06.30       SxV5IQN2RRq4DTWqTREeqA  12   1     101980            0     92.6mb         46.3mb
green  open   logstash-teakd-vector-2017.07.03        Vj1KUNkhSFu--xtItRlf5w  12   1   54535883            0    107.4gb         53.6gb
green  open   .triggered_watches                      VNzJLlT7TZCvr4UDIHn2cg   1   1          0            0    136.5kb         68.2kb
green  open   .monitoring-data-2                      EydxEs6QTcqzD0gp22v8AQ  12   1          9            0       53kb         26.5kb
green  open   logstash-error-vector-2017.07.01        txQX6ChxQhu_YGf6M03qYQ  12   1  106674534            0     47.4gb         23.6gb

This file has been truncated. show original

Ravikumar_G · July 23, 2017, 4:23pm

Great will give a try now thanks again

system · August 20, 2017, 4:23pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
3,000 events/sec Architecture Elasticsearch	10	1787	July 6, 2017
ElasticSearch memory usage on centralized log clusters Elasticsearch	4	951	July 6, 2017
Scaling Elasticsearch for 40GB of data Elasticsearch	5	1212	July 6, 2017
Hitting some limit on ElasticSearch Elasticsearch	5	626	July 6, 2017
New User -- Index Settings Reccomdendations and Suggestions Elasticsearch	8	463	July 6, 2017

ES Huge Cluster Setup Advice high usage of RAM

Indices settings

Data 6 : 10.146.134.11

Indices settings

Related topics