How to upgrade Logstash in Filebeat + Logstash pipeline?

I'm working on a system that uses Filebeat (v7.8) with Logstash output (v2.2.4).
Filebeat reads files from a specific folder and publishes the data to Logstash which in turn sends the output to RabbitMQ.

I need to upgrade Logstash to the latest version (or at least a newer version) to be able to use forked pipelines for a single input multiple config scenario.

What are the steps I should follow to prevent data loss? Is there a way I can persist the checkpoint of the last read event in fileabeat so that the newer Logstash can resume from there?

I did a dummy setup to see if I can upgrade logstash without data loss.

I setup Filebeat to read a CSV (with only line numbers for ease of verification) and push them to Logstash.

After about half the records had been written, I stopped Logstash and changed the output to a new filepath. Then I restarted Logstash with the new config.

Few issues faced/noted:

  • Many records were duplicated in the new file.
  • Duplicates were found in the same file.
  • Data pushed was in no specific order in either of the files.

The same issues were noted when I stopped Filebeat first, and then Logstash, changed the Logstash output path, and restarted Logstash then Filebeat.

Is there a way to avoid duplicates and potential data loss?

You should be able to upgrade without any data loss. Duplicates can result from events that were inflight while Logstash was stopped. Those events had not been ACKed so in order to prevent data loss Filebeat sends them again.

The simplest way to avoid duplicates for this scenario is to add an ID to the events using add_id in Filebeat.

processors:
  - add_id: ~

This will give each event a unique ID such that if it is resent due to LS going down the event will not create duplicates because that ID already exists in Elasticsearch.

Run the same test again to verify your config is working properly. You'll probably need to add document_id to your Elasticsearch output to ensure the ID is passed through in events (document_id => "%{[@metadata][_id]}").

Saw this after I was done responding. It goes into more details and includes fingerprint as alternative to add_id. Deduplicate data | Filebeat Reference [7.14] | Elastic

1 Like

But I'm not pushing the data to ES. Only to Logstash, which is gonna send the output (through a forked pipeline) to RabbitMQ & Kafka.

How can I use @metadata._id to dedup in there?

This deduplication method I described depends on Elasticsearch. It has the information about each ID that it has seen so it can prevent duplicates. I don't think there are equivalent features in Kafka out of the box.

How can I gracefully move to a new Logstash installation, though?

Should I just stop the current service and make a fresh installation?
Or should I stop Filebeat, stop-install-start new Logstash, then restart Filebeat?

I want to avoid receiving any extra warning/log messages.
Also, I can see that filebeat-god is in use. So, if I need to restart, will a simple service filebeat stop and service filebeat start do?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.