How to upgrade Logstash in Filebeat + Logstash pipeline?

GitSpree23 · August 11, 2021, 4:59am

I'm working on a system that uses Filebeat (v7.8) with Logstash output (v2.2.4).
Filebeat reads files from a specific folder and publishes the data to Logstash which in turn sends the output to RabbitMQ.

I need to upgrade Logstash to the latest version (or at least a newer version) to be able to use forked pipelines for a single input multiple config scenario.

What are the steps I should follow to prevent data loss? Is there a way I can persist the checkpoint of the last read event in fileabeat so that the newer Logstash can resume from there?

GitSpree23 · August 17, 2021, 10:43am

I did a dummy setup to see if I can upgrade logstash without data loss.

I setup Filebeat to read a CSV (with only line numbers for ease of verification) and push them to Logstash.

After about half the records had been written, I stopped Logstash and changed the output to a new filepath. Then I restarted Logstash with the new config.

Few issues faced/noted:

Many records were duplicated in the new file.
Duplicates were found in the same file.
Data pushed was in no specific order in either of the files.

The same issues were noted when I stopped Filebeat first, and then Logstash, changed the Logstash output path, and restarted Logstash then Filebeat.

Is there a way to avoid duplicates and potential data loss?

andrewkroh · August 17, 2021, 11:18am

You should be able to upgrade without any data loss. Duplicates can result from events that were inflight while Logstash was stopped. Those events had not been ACKed so in order to prevent data loss Filebeat sends them again.

The simplest way to avoid duplicates for this scenario is to add an ID to the events using add_id in Filebeat.

processors:
  - add_id: ~

This will give each event a unique ID such that if it is resent due to LS going down the event will not create duplicates because that ID already exists in Elasticsearch.

Run the same test again to verify your config is working properly. You'll probably need to add document_id to your Elasticsearch output to ensure the ID is passed through in events (document_id => "%{[@metadata][_id]}").

Saw this after I was done responding. It goes into more details and includes fingerprint as alternative to add_id. Deduplicate data | Filebeat Reference [7.14] | Elastic

GitSpree23 · August 17, 2021, 11:43am

But I'm not pushing the data to ES. Only to Logstash, which is gonna send the output (through a forked pipeline) to RabbitMQ & Kafka.

How can I use @metadata._id to dedup in there?

andrewkroh · August 17, 2021, 1:01pm

This deduplication method I described depends on Elasticsearch. It has the information about each ID that it has seen so it can prevent duplicates. I don't think there are equivalent features in Kafka out of the box.

GitSpree23 · August 18, 2021, 7:03pm

How can I gracefully move to a new Logstash installation, though?

Should I just stop the current service and make a fresh installation?
Or should I stop Filebeat, stop-install-start new Logstash, then restart Filebeat?

I want to avoid receiving any extra warning/log messages.
Also, I can see that filebeat-god is in use. So, if I need to restart, will a simple service filebeat stop and service filebeat start do?

system · September 15, 2021, 9:03pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Moving from Logstash to Filebeat => no duplicate log Beats filebeat	3	1274	January 4, 2017
Duplicated logs Beats filebeat	8	364	April 25, 2024
Duplicate events in filebeat + logstash + elasticsearch pipeline Logstash	2	1913	July 6, 2017
Filebeat upgrade requiring multiple restarts Beats filebeat	10	1478	February 7, 2017
Loadbalance duplicating the events using logstash Beats filebeat	8	3813	October 14, 2016

How to upgrade Logstash in Filebeat + Logstash pipeline?

Related topics