_node/stats - duration_in_millis vs queue_push_duration_in_millis

TL;DR
A higher queue_push_duration_in_millis than duration_in_millis means that your inputs are fast and your filter/outputs are slow.


The value of queue_push_duration_in_millis is a bit difficult to make sense of because it is dependent on whether you are using the memory or persistent queue.

Some inputs are single threaded, some are fixed multi-threaded and some are variable multi-threaded e.g. TCP, it creates a Thread per connection.

The memory queue is a synchronous hand off between an input thread (writer) and a worker thread (reader). A reader will wait for a writer thread and vice-versa. A reader, when it has moved to the front of the waiting list, will try to acquire a "batch" of events, by taking an event from any waiting writers (or leave with a short batch if it times out).
If you have many writers and, say 8 readers (# of cores) that are slow then the time spent waiting and the queue_push_duration_in_millis will be higher.

The persistent queue is async. The readers and writers have no dependency on each other but there is a queue wide read/write lock on it at the moment, so the readers and writers have to interleave their calls. If one has slow readers that are off filtering or outputting then the writers only have to interleave with themselves or if writers are waiting for new input then the readers are free to drain the queue, interleaving amongst themselves only (this is why we see that less workers with bigger batches can be faster than more workers with smaller batches when there is highly variable incoming event cadence).

To get a sense of duration_in_millis one needs to think of this as the time spent doing things to the batch - from the instant the batch is "opened" (at the end of the reader acquiring a batch described above) to the instant that the batch is "closed" after the filter and output sections are done.

The persistent section is included for other readers.

2 Likes