Filebeat to send logs to Kafka, and if Kafka is down, failover to Elasticsearch.
Hi @Thirupathi,
Welcome! Multiple outputs is not supported by Filebeat as covered by this post from 2020. However, as you'll see from the post, you can run 2 instances of Filebeat, with one sending directly to Kafka and the other to Elasticsearch.
There isn't really a way to detect Kafka is down and redirect to Elasticsearch.
Hope that helps!
Can we use an alternative approach to send messages by calling a Java API from Filebeat?
No @Thirupathi, Filebeat watches and processes files. It's not intended for heavy duty ETL processing.
If you are wanting to send to multiple outputs rather than run 2 Filebeat instances you may want to look at using Logstash instead. It has output plugins for Kafka and Elasticsearch via a file input plugin.
Multiple outputs can be configured for Logstash:
Let us know if that better suits your use case.
"My first option is Kafka. If Kafka is down or not running at that time, the message should go to Elasticsearch
If Kafka is the primary output, but messages should go to Elasticsearch when Kafka is down, you need to configure Kafka as the primary output and use Elasticsearch as a fallback
Any update one this requirement
Hi @Thirupathi,
Can you explain what update you're looking for? As stated above, Filebeat doesn't support multiple outputs, but you may be able to do what you need with Logstash.
If you're looking to know about sending to Elasticsearch if Kafka is down explicitly, it's not something I've done myself so I'm unsure. But you could investigate if pinging an API using the http
filter along with the conditional output works in a pipeline.
Hope that helps!
@Thirupathi you seem stuck on the "kafka as the primary output Elasticsearch as a fallback" model.
You do know Logstash supports persistent queues, right?
If your Kafka output is later sending the data to (same) Elastic, and your idea is "I like the buffering of Kafka, but if Kafka is not there I'm happy to just chuck everything at Elasticsearch directly,", then I dont want to be there the day you find that model isn't as sensible as you think it is. Put another way, either Kafka has a good reason to be there, or it doesn't.
This is not possible with any tool in the Stack.
With filebeat you can have only one output, you can send data to Elasticsearch, Kafka or Logstash, but only one of them.
If you send data from Filebeat to Logstash, you can then send it to other places, but you cannot choose to send data to a different output based in the availability.
You can however try to build some logic in a Logstash configuration that would conditionally send logs to different outputs, but this is not something that is simple to do and would involve other external tools.
in case Kafka is down, Filebeat will redirect messages to the DLQ,(Dead Letter Queue configuration)please provide the configuration
Filebeat does not have DQL, what you have is a queue for the events, per default this queue is in memory, but you can use a disk queue, which has performance impacts.
The configuration is here.
If Kafka is down, can Logstash fall back to Elasticsearch?
Can you explain why it has to be Kafka and then Elasticsearch as a fallback? What are you trying to do?
As covered above you can send to Kafka and Elasticsearch at the same time using Logstash (and not Filebeat) to give you contingency in the event of a failure, or you have to build custom logic in Logstash using scripts to figure out where to send to as @leandrojmp outlines. Or you can make use of persistent queues instead as @RainTown suggests.
Let us know!
To achieve a failover mechanism where logs/messages are sent to Kafka first and automatically redirected to Elasticsearch if Kafka is down, you can implement this in Logstash using conditional logic.
This thread is going a bit in circles (ironically).
You have asked the same question repeatedly. Do you not believe the answers you have been given?
To which "you" do you refer?
Also, though its not answering your question the way you want, please read the bit I wrote above - either you need/want Kafka in your flow (and there are certainly good arguments for this) or you don't. To wish sometimes to have it in the flow is ... an unusual approach.
Imagine, something goes a bit mad, kafka falls over due to massive load. What should be done now? Send the massive load directly to elasticsearch? Really? That's an interesting strategy. But YMMV. If you don't trust your Kafka setup, then make it better. If it's flakey, improve it.
As mentioned, there is nothing native that will select another output based on the availability of other output, but knowing how Logstash works and knowing that it can get information from external sources, you can implement some logic to do that.
One example is this one:
filter {
mutate {
add_field => {
"[@metadata][output]" => "output"
}
}
translate {
source => "[@metadata][output]"
target => "[@metadata][output]"
dictionary_path => "/path/to/a/yml/dictionary/output.yml"
refresh_interval => 30
}
}
output {
if [@metadata][output] == "kafka" {
kafka {}
}
if [@metadata][output] == "elasticsearch" {
elasticsearch {}
}
}
In the filter block you add a field, [@metadata][output]
, with the static value of output
, then you use a translate filter to replace the value of this field with a value that will be stored in an external yml
file, and logstash checks the value of this value every 30 seconds.
The content of this file would be something like this:
"output": "kafka"
So, when the pipeline runs, the value of [@metadata][output]
is replaced by kafka
and then in the output block, the kafka output is used.
If you change the yml
dictionary to something like:
"output": "elasticsearch"
Then, when Logstash refresh this in memory after 30 seconds, the Elasticsearch output will now be used.
This is one of the ways on how you can have a conditional output.
To update the yml dictionary according to the availability of outputs has nothing to do with Logstash, you need to build it yourself using some scripts, automations tools, anything that you want, this is a simple automation task.
But the main question is, what issue are you currently having? Kafka is expected to be run in clusters, so you have more than one brokers to be able to have high availability, same thing with Elasticsearch.
For Kafka to be down, it means that the entire cluster with multiple brokers is having issues, so you have bigger problems than Filebeat or Logstash switching outputs.