We are working on designing ELK cluster for our client. We came up with the below design approach.
Filebeat will source the data from log files, add tags and send it to Kafka Producer
Kafka consumer will read from Kafka producer and source it to the logstash
Logstash will parse the logs based on the tags and index to the Elastic Cluster.
We have got questions related to the best design approach
We have different clients, is it good to have individual filebeat configuration files for each client or can we have the configuration in one filebeat configuration file.
Similarly is it good to have individual logstash file configuration for each clients.
To load balance the writing speed of logstash into Elastic cluster do we need a load balancer like Kafka in between them
We have different clients, is it good to have individual filebeat configuration files for each client or can we have the configuration in one filebeat configuration file.
Filebeat doesn't care. Use whatever fits your needs.
Similarly is it good to have individual logstash file configuration for each clients.
Same thing here. However, you may want to consider running multiple Logstash pipelines if you want to isolate events from different clients. Depends on how many clients you have though; it's probably not advisable to have too many pipelines.
To load balance the writing speed of logstash into Elastic cluster do we need a load balancer like Kafka in between them
I don't see how it's even possible to put Kafka between Logstash and Elasticsearch, but if you're talking about an HTTP load balancer I don't think it's necessary. Logstash's elasticsearch output will distribute requests between the ES nodes on its own, and most of the work during indexing is done by the node(s) storing the shards being written to and you can't control that by distributing the requests.
Filebeat doesn't care. Use whatever fits your needs
I am new to filebeat. As far as I know under prospectors in filebeat.yml file,we would be having 3 log files for each client and approximately we have 25 clients,then we will have 75 tags defined. But we will have only one output per .yml file. We have individual kafka topic for each client in that case should I need individual filbeat configurations?
Similarly is it good to have individual logstash file configuration for each clients
We have individual kafka consumers for each client , then is it possible to configure logstash to listen to all the kafka topics and the parse/filter the data based on the tags or we should need 25 Logstash pipelines?
I don't see how it's even possible to put Kafka between Logstash and Elasticsearch, but if you're talking about an HTTP load balancer I don't think it's necessary. Logstash's elasticsearch output will distribute requests between the ES nodes on its own, and most of the work during indexing is done by the node(s) storing the shards being written to and you can't control that by distributing the requests
Sorry for the confusion, I meant the http load balancer, we understood that we cant control that request distribution.
I am new to filebeat. As far as I know under prospectors in filebeat.yml file,we would be having 3 log files for each client and approximately we have 25 clients,then we will have 75 tags defined. But we will have only one output per .yml file. We have individual kafka topic for each client in that case should I need individual filbeat configurations?
I'm not sure how Filebeat works here. You may need one Filebeat instance per client. Check the documentation.
We have individual kafka consumers for each client , then is it possible to configure logstash to listen to all the kafka topics and the parse/filter the data based on the tags
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.