Heavy Traffic on Logstash causing data inconsistencies

keeshqs · August 3, 2018, 3:34am

Hi all,

We would like to seek for your advice regarding an issue we have on Logstash. Currently we are receiving data from 18 servers/clients (each with 9 check metrics that creates 9 indices in ES per day). The event data per client are then sent in parallel to Logstash every 10 minutes via TCP (few times there are thousands of records for a single client) - the issue is - now that we are in Production, we observed that there are inconsistencies and missing data on Elasticsearch although it was supposed to be received and processed by Logstash.

We cannot find any relevant message (warning/critical) on Logstash logs even though we're on debug mode. The health status of both Logstash and Elasticsearch are also set to green when we check on the monitoring dashboard. The average CPU utilization was minimal with only 15% while the used JVM Heap is 456Mb.

We have replicated a few data that was supposed to be processed in our DEV environment if we could reproduce the issue but it was working fine. The only thing that we cannot reproduce is the amount of traffic coming into the system since this is production data already. In this case, we're thinking that this maybe related to the load of events received on Production as compared to Development. Our config setting is fairly default so we're not sure if we missed anything.

Elasticsearch: 6.3.0
Logstash: 6.3.0
Kibana: 6.3.0

> ------------ Pipeline Settings --------------

 The ID of the pipeline.

 pipeline.id: main

 Set the number of workers that will, in parallel, execute the filters+outputs
stage of the pipeline.

 This defaults to the number of the host's CPU cores.

 pipeline.workers: 2

 How many events to retrieve from inputs before sending to filters+workers

 pipeline.batch.size: 125

 How long to wait in milliseconds while polling for the next event
 before dispatching an undersized batch to filters+outputs

 pipeline.batch.delay: 50

 Force Logstash to exit during shutdown even if there are still inflight
 events in memory. By default, logstash will refuse to quit until all
 received events have been pushed to the outputs.

 WARNING: enabling this can lead to data loss during shutdown

 pipeline.unsafe_shutdown: false

 ------------ Queuing Settings --------------
 
  Internal queuing model, "memory" for legacy in-memory based queuing and
  "persisted" for disk-based acked queueing. Defaults is memory
 
  queue.type: memory
 
  If using queue.type: persisted, the directory path where the data files will be stored.
  Default is path.data/queue
 
  path.queue:
 
  If using queue.type: persisted, the page data files size. The queue data consists of
  append-only data files separated into pages. Default is 64mb
 
  queue.page_capacity: 64mb
 
  If using queue.type: persisted, the maximum number of unread events in the queue.
  Default is 0 (unlimited)
 
  queue.max_events: 0
 
  If using queue.type: persisted, the total capacity of the queue in number of bytes.
  If you would like more unacked events to be buffered in Logstash, you can increase the
  capacity using this setting. Please make sure your disk drive has capacity greater than
  the size specified here. If both max_bytes and max_events are specified, Logstash will pick
  whichever criteria is reached first
  Default is 1024mb or 1gb
 
  queue.max_bytes: 1024mb
 
  If using queue.type: persisted, the maximum number of acked events before forcing a checkpoint
  Default is 1024, 0 for unlimited
 
  queue.checkpoint.acks: 1024
 
  If using queue.type: persisted, the maximum number of written events before forcing a checkpoint
  Default is 1024, 0 for unlimited
 
  queue.checkpoint.writes: 1024
 
  If using queue.type: persisted, the interval in milliseconds when a checkpoint is forced on the head page
  Default is 1000, 0 for no periodic checkpoint.
 
  queue.checkpoint.interval: 1000

system · August 31, 2018, 3:35am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash cpu usage blocked at 50% Logstash	3	493	November 15, 2021
Logstash Event Processing is decreasing as times go by Logstash	2	318	July 3, 2018
Finding bottleneck in pipeline Logstash	9	1530	March 1, 2022
Logstash CPU Problem Logstash	14	1043	January 1, 2024
Data loss happening somewhere Logstash	2	731	July 6, 2017

Heavy Traffic on Logstash causing data inconsistencies

Related topics