Hi,
I'm Newbie. I have question :
I'm setting up ELK with one Cluster (cluster name : Main-Cluster).
I want to have three node (Node1, Node, Node3) for receiving data from Logstash. (data Node ???)
I see three different kinds of nodes you can configure in Elasticsearch (Data Node, Dedicated Master Node, Client Node)
Master-Node: 10.1.11.111
Clien-Node: 10.1.11.112
Node1 : 10.1.11.113
Node2 : 10.1.11.114
Node3 : 10.1.11.115
Server (16 vCPU - 16Gb RAM) (base on Pureflex)
So i have some question :
I must create 5 server right ? (3 node data (Node1, Node2, Node3) + 1 Dedicated Master Node + 1 Client Node ) It's right ?
And where should be Logstash configure point to Node ( Node 1,2,3 or Master Node or Client Node)
And How to configure Logstash for load balance ES .
It's one valid way of doing it. It's not clear that you actually need dedicated master and client nodes. Dedicated masters are usually only recommended for clusters of ten or more nodes. With five 16 GB nodes at my disposal I probably would've made all nodes equal nodes for master and data.
I think the client node would be the best pick. In any case not the master node.
In Logstash 2.0 and later the elasticsearch output plugin can be configured to know about multiple ES nodes for failure tolerance (i.e. it'll connect to another node if the current node becomes unavailable), but this isn't really load balancing and I'm not sure load balancing is very realistic since the node you connect to and the node carrying out the work isn't the same thing. Keep in mind that with a single master node your cluster won't be very failure tolerant anyway.
That configuration looks okay.
Second group of questions:
For fault tolerance I'd want to configure Filebeat to connect to all three data nodes.
Yes.
Since you only have a single master node (which I, again, don't recommend) I'd run as few services as possible on that machine.
And i have collect log from 100 server, so How many node, that shoud be create on cluster for best performance
Now i create 3 Node ES (All three nodes are master and data and join same Cluster)
Node-01
Node-02
Node-03
And i create 2 server Logstash (LS-01 &7 LS-02)
I have 10 server need collect log. (Server 1, 2, 3, ....., 9, 10) ( using filebeat, topbeat, packetbeat...)
I think configure for beat :
From Server 1, 2, 3, 4, 5 Point to LS-01.
From Server 6, 7, 8, 9, 10 Point to LS-02.
And have 1 server install Kibana (KB-01).
Configure of Kibana :
elasticsearch.url: "Node-03:9200"
All of thing is right ?
So from kibana, can i view all data in 3 Node (Node-01, Node-02, Node-03) ???
And when i stay any node, can i search all data (log from 10 server) ?
That depends on how long you want to keep the logs, the query load, how fast queries you need, how many replicas of the data, and so on. There's a reason there's no official formula for sizing clusters.
Splitting the cluster and fragmenting the resources sounds like a bad idea.
Using a queue is a nice way of distributing the load between multiple Logstash instances. With 300 GB/day I suspect a single Logstash instance won't be able to keep up.
But how do calculate number Server Logstash, Logstash Instance, number Node ES in this case :
Total server : ~ 100 Server need collect log ( Web server, DB, VMWare, lync, Mail exchange .... ) and 75% is Virtual Machine
Total log / day : One day, i have 300GB log ( from 100 server but mainly logs of mail exchange (150GB) ).
All server in same location Keep log 1 month and i need 1 replicas of the data only
Can you suggest for me ...
How many Logstash Shipper ?
How many Logstash Instanse ?
How many ES Node ?
Server (16vCPU + 16GB RAM)
I can't find document for deploy System detail...
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.