Cons of having apache kafka between two logstashes

Hello. Since logstash 5.4, the document https://www.elastic.co/guide/en/logstash/5.4/deploying-and-scaling.html suggests to have beats directly connected to kafka instead of having "beats -> logstash shipper -> kafka -> logstash indexer -> elasticsearch" infrastructure. Why? Does the kafka between 2 logstashes have some disadvantages?

It's a more complex setup with no obvious gains.

I see at least 3 gains:

  • To be independent of queue solution. In case some problems we could just use another solution or temporary run without additional queue without any change in a lot of beats clients.
  • Doing a simple event based authorization. Each of our beats sends "cluster" and "authkey" fieds in each event. I wrote a small logstash authorization plugin which pass or drops the event based on these fields. This auth should be in the first logstash before enqueueing to Kafka. The reason for this is simple: I do not manage all the beats, some of them is managed by another ops or dev teams so in case of some typo in beats config we will not have mess in our indexes (index names are generated based on "cluster" field so not possible to write to another index in this case). An alternative for this would be to have hundereds of indepedent kafka topics and do the ACL based auth...
  • Ability to use some simple logic in the first logstash for classifying the messages, e.g. does this message contains confidental data - do some anonymization or forward it to another solution or drop it ...

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.