My question is related to the correct way to configure the output, for Elasticsearch.
Right now my Elasticsearch cluster is one master node, 2 ingest nodes and 1 data node.
My Logstash output configuration is linked to the ingest nodes, the data is routed in base some conditions, so the data can be sent to the ingest_node_1 or to ingest_node_2 but never to both of them. This works quite fine until now.
But after check the stack, review forums and, in general, make some research, I'm starting to doubt if I must link the output to the master node instead of the ingest nodes.
The idea is growth it to 2 masters nodes, 4 ingest nodes and 4 data nodes, but for now with this cluster is enough to handle the actual volume of data (yeah, I know that the replication is not working good now because we only have 1 data node, the first thing that we will add will be other data node), that's why I was trying to make it works with this structure.
So basically I was right, Logstash needs to send the data directly to the ingest nodes, isn't it? and any way to balance the load? I was thinking to put all the ingest nodes behind a load balancer for the Logstash, but not sure if this will work well.
Use the IP addresses of three non-master nodes in your Elasticsearch cluster in the host line. When the hosts parameter lists multiple IP addresses, Logstash load-balances requests across the list of addresses. Also note that the default port for Elasticsearch is 9200 and can be omitted in the configuration above.
So I just need to add only all the ingest nodes on the output definition and do not base it on my conditions (because, basically, this was for balance the load).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.