Identifying Pipeline Bottlenecks (Lost Events)

tobydickinson · September 4, 2019, 11:24am

First time poster, long time lurker.

I am using the Elastic stack to process and analyze XML documents which are sent to me over the HTTP protocol. I currently have the following pipeline set up to realise this behaviour:

Node.JS (receives document over HTTP and does some processing)
Logstash formats XML as JSON and does further processing on some fields
Elastic indexes the documents
(4. Kibana for visualisation)

This works great on our live system (Windows server + 3 x CentOS Elastic Cluster) however I am migrating to a containerized solution on our test system, to be eventually rolled out to live.

Many events (approximately 55% or 5,135 events over a 15 minute period) are being lost on the test system and I do not know where. I know this because I can find event ids which exist on the live system, but not on the test system (they share a data feed). Does anyone have any ideas how I could go about identifying which part of this pipeline is causing the bottleneck and events to be missed? Any help would be much appreciated.

tobydickinson · September 5, 2019, 8:18am

Talking to myself here, for anyone who is using a similar setup. Logstash to Elastic uses backpressure, through HTTP status codes, to buffer events (in memory) when Elastic is too busy to accept. This means events, in theory, cannot be lost between these components.

I my previous setup Node.JS was sending events to Logstash over TCP and, if Logstash was too busy to accept, would cause a timeout but my solution had a high time-out set of 3 seconds which meant these were likely missed.

The solution is to put a messaging queue, such as Apache Kafka, between Node.JS and Logstash and then any events which are lost you can be fairly certain it happened within Node.JS

system · October 3, 2019, 8:18am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Finding bottleneck in pipeline Logstash	9	1530	March 1, 2022
Does logstash drop Events Logstash	6	1558	September 11, 2018
How to find missing logs in Logstash? Logstash	1	1376	August 3, 2018
Logging events lost when Elastic Search connection goes down Logstash	3	915	July 6, 2017
Logstash 6.7.1 - Events disappear after tcp input plugin in pipeline Logstash	1	317	June 13, 2019

Identifying Pipeline Bottlenecks (Lost Events)

Related topics