Best practice with Logstash output and ES

Hello,

My question is related to the correct way to configure the output, for Elasticsearch.

Right now my Elasticsearch cluster is one master node, 2 ingest nodes and 1 data node.

My Logstash output configuration is linked to the ingest nodes, the data is routed in base some conditions, so the data can be sent to the ingest_node_1 or to ingest_node_2 but never to both of them. This works quite fine until now.

But after check the stack, review forums and, in general, make some research, I'm starting to doubt if I must link the output to the master node instead of the ingest nodes.

Any recommendations?

No, do not make Logstash send data to dedicated master nodes. I think this is mentioned in the documentation.

With such a small cluster as yours I'd consider making all nodes master-eligible to avoid having one node that can take down the whole cluster.

The idea is growth it to 2 masters nodes, 4 ingest nodes and 4 data nodes, but for now with this cluster is enough to handle the actual volume of data (yeah, I know that the replication is not working good now because we only have 1 data node, the first thing that we will add will be other data node), that's why I was trying to make it works with this structure.

So basically I was right, Logstash needs to send the data directly to the ingest nodes, isn't it? and any way to balance the load? I was thinking to put all the ingest nodes behind a load balancer for the Logstash, but not sure if this will work well.

Ok re-reading the docs I found the answer:

To configure your Logstash instance to write to multiple Elasticsearch nodes, edit the output section of the second-pipeline.conf file to read:

output {
    elasticsearch {
        hosts => ["IP Address 1:port1", "IP Address 2:port2", "IP Address 3"]
    }
}

Use the IP addresses of three non-master nodes in your Elasticsearch cluster in the host line. When the hosts parameter lists multiple IP addresses, Logstash load-balances requests across the list of addresses. Also note that the default port for Elasticsearch is 9200 and can be omitted in the configuration above.

So I just need to add only all the ingest nodes on the output definition and do not base it on my conditions (because, basically, this was for balance the load).

Thanks!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.