Deploy Cluster ES with multiple node?


(Tat Dat Pham) #1

Hi,
I'm Newbie. I have question :
I'm setting up ELK with one Cluster (cluster name : Main-Cluster).
I want to have three node (Node1, Node, Node3) for receiving data from Logstash. (data Node ???)
I see three different kinds of nodes you can configure in Elasticsearch (Data Node, Dedicated Master Node, Client Node)

  • Master-Node: 10.1.11.111
  • Clien-Node: 10.1.11.112
  • Node1 : 10.1.11.113
  • Node2 : 10.1.11.114
  • Node3 : 10.1.11.115

Server (16 vCPU - 16Gb RAM) (base on Pureflex)

So i have some question :

  1. I must create 5 server right ? (3 node data (Node1, Node2, Node3) + 1 Dedicated Master Node + 1 Client Node ) It's right ?
  2. And where should be Logstash configure point to Node ( Node 1,2,3 or Master Node or Client Node)
  3. And How to configure Logstash for load balance ES .
  4. Is this the correct configuration?

output {
elasticsearch {
hosts => "10.1.11.111:9200"
sniffing => true
manage_template => false
index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "%{[@metadata][type]}"
}
}

  1. In node1, node2, node3, i should be install logsstash in each node ? or install only logstash on one Node ?
  2. Is Kibana configure point to "Client Node" ( 10.1.11.112) right ??
  3. Are Kibana, Logstash install on Master Node right ?

And i have collect log from 100 server, so How many node, that shoud be create on cluster for best performance


(Magnus Bäck) #2

First group of numbered questions:

  1. It's one valid way of doing it. It's not clear that you actually need dedicated master and client nodes. Dedicated masters are usually only recommended for clusters of ten or more nodes. With five 16 GB nodes at my disposal I probably would've made all nodes equal nodes for master and data.
  2. I think the client node would be the best pick. In any case not the master node.
  3. In Logstash 2.0 and later the elasticsearch output plugin can be configured to know about multiple ES nodes for failure tolerance (i.e. it'll connect to another node if the current node becomes unavailable), but this isn't really load balancing and I'm not sure load balancing is very realistic since the node you connect to and the node carrying out the work isn't the same thing. Keep in mind that with a single master node your cluster won't be very failure tolerant anyway.
  4. That configuration looks okay.

Second group of questions:

  1. For fault tolerance I'd want to configure Filebeat to connect to all three data nodes.
  2. Yes.
  3. Since you only have a single master node (which I, again, don't recommend) I'd run as few services as possible on that machine.

And i have collect log from 100 server, so How many node, that shoud be create on cluster for best performance

That question is too broad and can't be answered.


(Tat Dat Pham) #3

Thank for best support.

But maybe everything is not clear to me :frowning:

Now i create 3 Node ES (All three nodes are master and data and join same Cluster)

  • Node-01
  • Node-02
  • Node-03

And i create 2 server Logstash (LS-01 &7 LS-02)
I have 10 server need collect log. (Server 1, 2, 3, ....., 9, 10) ( using filebeat, topbeat, packetbeat...)
I think configure for beat :
From Server 1, 2, 3, 4, 5 Point to LS-01.
From Server 6, 7, 8, 9, 10 Point to LS-02.

And configure on LS-01 :

output {
elasticsearch {
hosts => "Node-01:9200"
sniffing => true
manage_template => false
index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "%{[@metadata][type]}"
}
}

And configure on LS-02 :

output {
elasticsearch {
hosts => "Node-02:9200"
sniffing => true
manage_template => false
index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "%{[@metadata][type]}"
}
}

And have 1 server install Kibana (KB-01).
Configure of Kibana :

elasticsearch.url: "Node-03:9200"

All of thing is right ?
So from kibana, can i view all data in 3 Node (Node-01, Node-02, Node-03) ???
And when i stay any node, can i search all data (log from 10 server) ?

Thanks you!


(Magnus Bäck) #4

All of thing is right ?

Yes, it looks reasonable.

So from kibana, can i view all data in 3 Node (Node-01, Node-02, Node-03) ???

If all nodes are part of the same cluster, yes.

And when i stay any node, can i search all data (log from 10 server) ?

Yes.


(Tat Dat Pham) #5

Thank you very much :grin:

Come back to question

And i have collect log from 100 server, so How many node, that shoud be create on cluster for best performance

  • Total server : ~ 100 Server need collect log ( Web server, DB, VMWare, lync, Mail exchange .... ) and 75% is Virtual Machine
  • Total log / day : One day, i have 300GB log ( from 100 server but mainly logs of mail exchange (150GB) ).
  • All server in same location

Can you suggest something for me:

  1. How many node , that i should be create ?
  2. Do i split Cluster ? (or use only cluster) ?
  3. And with many server and 300GB log, should i using message queues like RabbitMQ.

Thanks, again!


(Magnus Bäck) #6
  1. That depends on how long you want to keep the logs, the query load, how fast queries you need, how many replicas of the data, and so on. There's a reason there's no official formula for sizing clusters.
  2. Splitting the cluster and fragmenting the resources sounds like a bad idea.
  3. Using a queue is a nice way of distributing the load between multiple Logstash instances. With 300 GB/day I suspect a single Logstash instance won't be able to keep up.

(Tat Dat Pham) #7

I keep log 1 month and i need one replicas of the data only.

So if i using message queues. What is Model for ELK on this case ?

Beat -> Message queues -> Multiple Logstash Instanse -> Multiple ES -> Kibana

But i don't see beat configure work with Message queues. ... and i read some post and they said :

Beat -> Logstash -> Message queues -> Multiple LogStash Instanse -> Multiple ES -> Kibana

It mean i must have :

Server need collect log (use beat) -> Server logstash -> Server Message Queues -> Multiple Server LogStash Instanse -> Server ES -> Kibana

http://www.upsieutoc.com/images/2015/12/16/ELK-Model.png


(Magnus Bäck) #8

Yes, that's right.


(Tat Dat Pham) #9

But how do calculate number Server Logstash, Logstash Instance, number Node ES in this case :

Total server : ~ 100 Server need collect log ( Web server, DB, VMWare, lync, Mail exchange .... ) and 75% is Virtual Machine
Total log / day : One day, i have 300GB log ( from 100 server but mainly logs of mail exchange (150GB) ).
All server in same location
Keep log 1 month and i need 1 replicas of the data only

Can you suggest for me ...
How many Logstash Shipper ?
How many Logstash Instanse ?
How many ES Node ?

Server (16vCPU + 16GB RAM)
I can't find document for deploy System detail...

Thank you!


(Magnus Bäck) #10

As I said, there is no formula for this and any number would just be a rough guess. I haven't run a cluster of that size myself.

16 VCPU and 16 GB RAM seems like an odd configuration for this workload. I suspect you'll run out of RAM long before you run out of CPU.


(system) #11