Hey guys,
I'm making some changes in my elasticsearch cluster and need a little help with the nodes role allocation and the instances configuration.
till now we had 3 nodes working as data and master-eligible together and kibana on another node listening to the cluster.
This setup gave us really hard time and we decided to take elasticsearch one step ahead.
Our Data:
the cluster managed with curator crons that keeps 2 weeks of raw-data (time-based indices) open for debug querying and one more week closed for emergencies.
besides that we have another index that kept update from the raw-data streaming.
The cluster holds ~100GB of data in 15 different indices.
each index divided to 5 shards with 1 replica.
The current setup composed from 7 nodes
3 dedicated master nodes - 2cpu 4gb ram.
3 dedicated data nodes - 4cpu 16gb ram.
1 coordinate node with kibana - 4cpu 8gb ram.
Those are my questions:
1.The instances setup makes sense? reasonable? I read that master nodes can usually be quite "light" compared to data nodes, is 2gb ram instance with 1gb heap sounds good?
2.when i get /_cat/nodes stat i notice that ram.precent is pretty high (above 90 on all nodes):
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
172.31.17.103 23 99 11 5.88 5.22 5.05 d - DATA1
172.31.12.76 31 94 37 0.30 0.20 0.12 m * MASTER1
172.31.29.43 22 99 7 5.26 5.00 4.98 d - DATA3
172.31.14.236 12 89 0 0.00 0.00 0.00 m - MASTER2
172.31.14.54 21 97 1 0.12 0.04 0.01 - - KIBANA
172.31.14.55 23 93 0 0.00 0.00 0.00 m - MASTER3
172.31.17.46 20 99 9 5.88 5.12 4.97 d - DATA2
this is a proper state? or should i limit this config setting? how can it be done?
3.our system handles with ~150 docs per sec but we want to be able to scale up to 1000 (with data increasing to ~500gb) with the minimal adjustments in the future.
any recommendations that will get us closer to that spot?
4.should we add dedicated coordinate node besides the node with kibana? what the immediate effects of such change? for now it's like we ran the cluster without any coordinate node because the node with kibana running elasticsearch on localhost and communicate with the cluster through the transport client.
-
how should we talk with the cluster from the web client? provide list of all the nodes ip? only the masters?
-
does cluster of 2 data nodes with 32ram each sounds better then the current setup?
-
separate monitoring cluster - mandatory configuration on production env? besides the monitor consistency it takes some of the cluster load?
Thanks so much for your help:)