How to get in memory queue statiscic


#1

I used persistent queue for my logstash servers. I could check actual stat of queue via:

curl -XGET 'localhost:9600/_node/stats/

and get response:

 "queue": {
    "events": 0,
    "type": "persisted",
    "capacity": {
      "queue_size_in_bytes": 37413424,
      "page_capacity_in_bytes": 67108864,
      "max_queue_size_in_bytes": 10737418240,
      "max_unread_events": 0
    }

I tested in memory queue on other nodes, but I do not know anything about queue usage from _node/stats:

  "queue": {
    "type": "memory"
  },

How can I know how full is my queue? Does logstash store this data in memmory reserved by jvm in jvm.options? I would like to ensure, that logstash will have enought memmory for running and queue.


(Guy Boertje) #2

The memory queue has a fixed size of ~ 1000 items. Enough to keep feeding the workers but not significant enough to take measurements of.

Memory calcs are roughly based on this formula average event size X batch_size X worker threads where batch_size is 125 by default and worker threads is the number of CPU cores as detected by the JVM. This can increase during serialisation and deserialisation while the original and its serialised form are in memory at the same time. There is also some overhead in the the input buffers as data is read from the sources.
The persistent queue uses off heap memory mapped files but only two pages are "open" at the same time.
I guess typical scenario might be described as:

  1. All the workers have a batch of 125, some are doing filtering and some are outputting to Elastic.
  2. You have a translate filter that has loaded a 500 MB dictionary into memory.
  3. You have two jdbc inputs that are in the process of building events from fetched 250 000 row datasets that are in memory.
  4. These inputs have filled the memory queue with 1000 events but the workers have not looped around to remove them from the queue.
  5. Both jdbc inputs have built their 501st event and are waiting on free space on the queue.
  6. Eventually the workers process all records.
  7. The jdbc inputs are not scheduled to fetch more records for at least another 20 seconds so they are idle.
  8. The workers sit idle while waiting for the queue to fill - there are no batches in-flight and the outputs are idle too.
  9. The translate filter dictionary memory consumption stays at 500MB and the JVM GC has garbage collected all the unreferenced objects.

Note that this idle time, while good for lower CPU consumption means a lower overall GB per hour throughput.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.