Beats protocol

what is the protocol used for beat input in logstash [tcp or udp]?
Can I configure the protocol used i.e. TCP or UDP?

I suggest you move your question to the Beats category where the Beats folks hang out.

The protocol uses TCP.

The protocol its name beats->logstash is lumberjack. Do not configure tcp-input in logstash if you want to get data from beats.

The lumberjack protocol sits on TCP. With TLS support you have one of TCP/lumberjack or TCP/TLS/lumberjack. Besides using TCP, if logstash is not available, beats might still be able to drop events. Filebeat will never drop, but other beats retry max 3 times before dropping events (retry is configurable in beats).

Thank you.
The reason I asked this question is:
Currently I am using the following to ship the data from application nodes to central logstash server.
On Individual application nodes running log stash instance with
Input - Application Log File
OutPut - UDP
On central Log Stash Server:
Input : UDP
Output : Kafka

Both application nodes and log stash server will be running separate VLANs in production, so I requested them to open UDP port xxxx from application nodes to logstash server.

Now I am planning to replace the log stash instances on individual application nodes with file beat to ship the logs to centralised log stash server. For this approach
So on central log stash server config will be:
Input - beats
Output - Kafka
Can you please let me know what is the configuration in application nodes for file beats?
Input - Application log files
output - lumberjack

please suggest.

so your current setup is:

application -> logstash -> kafka -> logstash -> ... ?

and you want to change it into:

application -> disk -> filebeat-> logstash -> kafka -> logstash -> ... ?

With filebeat 5.0 we're introducing kafka output in filebeat. Why not

application -> disk -> filebeat-> kafka -> logstash -> ... ?

Have you read the getting started guide? Step 3 explains how to configure with logstash. The configuration reference explains additional output options.

Thank you for the response.
application -> disk -> filebeat-> kafka -> logstash -> ... ?
With the above approach, we are getting Kafka dependency [in this case kafka host and other required configuration] to all shippers. We are having around 500 application nodes from which we are shipping the data to ELK.

So my idea was to use
application -> disk -> filebeat-> logstash -> kafka -> logstash -> ES Cluster

Do you see any issues with above approach?

Tl;dr: logstash 2.x + filebeat 1.x releases: test for performance. logstash/filebeat 5.x releases: I don't see any major issues with this approach.

Both approaches are legit I think. Really depends on your requirements. E.g. configure logstash hosts instead of kafka hosts. But they still need to be configured + with this many application nodes, you might end up requiring multiple logstash nodes for scaling ingestion.

The only problem I see (at the moment) is performance related on logstash side. There is a known issue for beats input in logstash being slow. A rewrite is currently actively developed helping with performance. Having tested an early POC of the rewrite I was quite impressed with throughput (I hope 5.0 beta1 will include a beta version of the rewrite).

That is, do test throughput and use whatever architecture makes sense for you, given performance matches expectations (you can still scale by adding more logstash nodes). With 5.0 releases I wouldn't be bothered about logstash performance at all.

One advantage of using logstash for sending data to kafka is, logstash-kafka plugins supporting more recent kafka releases thanks to running on JVM. beats require custom third-party driver only supporting kafka 0.8 and 0.9 yet.

1 Like