DLQ Errors after upgrading ELK (Elasticsearch and Kibana) from 7.17 to 8.15.2

Hi All,

Our log management process starts from Beats -> MSK (Kafka) -> Logstash -> Elasticsearch -> Kibana. We recently upgraded our Elastic cluster (application-test-cluster) in October. However, since the upgrade, we have been encountering errors in the Logstash-plain.log file, such as:

[2024-12-04T14:38:12,692][ERROR][org.logstash.common.io.DeadLetterQueueWriter][abstraction-layer-pipeline][e1e08fc0ce36265a506a698847dd945b3b0d6b4e1a5b16b48b410fc9c57d56ab] Cannot write event to DLQ(path: /data/logstash/dead_letter_queue/abstraction-layer-pipeline): reached maxQueueSize of 1073741824

Upon inspecting the DLQ files, we identified multiple reasons contributing to this error. Here are a few examples:

  1. "failed to parse field [State.EndpointName] of type [text] in document with id 'a53025231c6c8aa3b2dd4058e64a44e3e295d8fd'"
  2. "Expected text at 1:2013 but found START_OBJECT"}}}}câ–’â–’â–’â–’â–’â–’â–’â–’U2024-12-03T07:55:24.532244483Z â–’â–’qjava.util.HashMapâ–’dDATAâ–’xorg.logstash.ConvertedMapâ–’gMessageâ–’torg.jruby.RubyStringx'Executing endpoint 'Health checks'â–’h@versionâ–’torg.jruby."
  3. "object mapping for [source] tried to parse field [source] as object, but found a concrete value"}}}s10STâ–’â–’1g2024-12-03T03:42:22.721571195Z*â–’â–’qjava.util.HashMapâ–’dDATAâ–’xorg.logstash.ConvertedMapâ–’nrequestHeadersâ–’xorg.logstash.ConvertedMapâ–’wX-Client-Transaction-IDâ–’torg.jruby."

We added the following line to our pipelines.yml:


# CS Testing Oct Pipeline, single pipeline to read app logs

* pipeline.id: testing-pipeline
path.config: "/etc/logstash/conf.d/testing-pipeline.conf"
pipeline.ecs_compatibility: disabled 

Unfortunately, this change has not resolved the issue, and the errors persist. Could anyone please confirm if these Dead Letter Queue (DLQ) errors are causing any data loss? And, how to resolve these DLQ errors in all the pipelines?

Regards

Your errors are caused by mapping conflicts in Elasticsearch, to solve it you need to solve these mapping errors.

Those errors means that the data you are trying to index has a different mapping from that one in your index/template.

It depends on your input, but yes, this can lead to data loss.

Once the DLQ is full, the pipeline will stop, you would need to increase the size of the DLQ queue, remove old data or disable DLQ (which will lead to data loss).

Events are sent to DLQ when they cannot be indexed and you would need to use the dead_letter_queue input to process the data.

You need to fix the mappings of your index or adjust conflicting fields during ingestion, the DLQ is a sympton of a problem, not a problem itself.

2 Likes