Persistent Queue Configuration question


#1

Hi,
This might a very simple question but i am new to logstash.
I have just configured my logstash instance with the following configuration in the logstash.yml file:

path.data: /var/lib/logstash
queue.type: persisted
queue.max_bytes: 2gb
queue.page_capacity: 500mb
queue.max_events: 0

Logstash reads data from redis and processes it and sends it to ES cluster.
When i restart my logstash, i dont see any indication regarding the persistent queue being used or created. There is no 'queue' file created in the path.data directory as mentioned in the docs.
What am i doing wrong?

Thanks.


#3

Can anyone take a look at this issue please???


(Christian Dahlqvist) #4

Please be patient. This forum is manned by volunteers. If you need an SLA associated with you questions Elastic do offer commercial subscriptions that include SLA based support.

The feature you are asking about is also quite new, which means there is a limited number of people who may have practical experience with it and be able to help.


(Guy Boertje) #5

Have you read the docs on configuring persistent queues?

queue.max_bytes will be reached first.

In /var/lib/logstash you should see a folder queue and in that folder there should be a few files, some called page.<N>, one called checkpoint.head and some called checkpoint.<N>.

Events are pushed to the "head" of the queue. Events are pulled from the tail.

If the push rate is equal to the pull rate then one or two pages will be seen on disk. A page that has been read/acknowledged (pulled) is deleted. If the push rate is higher than the pull rate then you will see a number of page.<N> files building up.

Questions:

  1. Is this a development setup?
  2. Is the data in Redis finite?
  3. How are you restarting logstash?
  4. Are you sure you are editing the logstash.yml file that LS is using?

#6

This is a dev environment.
The data in redis is finite.
I am restarting logstash with sudo service logstash restart
@guyboertje, I didnt create the folder 'queue' before, but once i created it i am able to see two files: "checkpoint.head" and "page.<N>".
I have configured each page to 500 MB and the page.<N> is at 100MB and gradually increasing till 500MB. Does this mean that the data is not being sent over to ES? What does this file contain and when will it be again set to 0MB?
Thanks


(Guy Boertje) #7

How much test data are you putting into redis?

Previously, did you see events in ES and was Redis empty?


#8

@guyboertje, I am sorry, the data in redis is not finite. I am feeding live data to the cluster and currently its around 500-1000 events/sec


#9

Hello @guyboertje, I have another query. Is there any way we can confirm the existence of persistence queues?
Because i tried to get stats on the pipeline by using curl -XGET 'localhost:9600/_node/stats/pipeline?pretty' and i see that in the result there is no block showing the stats of persistent queues

"reloads" : { "last_error" : null, "successes" : 0, "last_success_timestamp" : null, "last_failure_timestamp" : null, "failures" : 0 } }
as mentioned in the docs.
Thanks.


(Guy Boertje) #10

Metrics for the persistent queue was not shipped in the initial release. It is coming soon.


#11

@guyboertje, I see that whenever the input load increases, multiple checkpoint files are created. When the corresponding page file is processed the checkpoint file needs to be deleted automatically. But my queue folder is filled with multiple files.
Example:

This is resulting in stalling the logstash shutdown or start process.


(Guy Boertje) #12

When the inflow rate of events is higher than the output, as you now know, page and checkpoint files build up.

If a worker thread is slow or very very slow (stuck) it will not have acked its batch so the page and checkpoint that the batch was read from can't be closed and deleted. But other workers will read and ack later pages. In this specific case when a later page is fully acked we only delete the page file and not its checkpoint. This means the PQ has a complete 'breadcrumb' trail from the checkpoint.head back to the unacked page and checkpoint. This allows us to tell on startup if the PQ is corrupted should pages and checkpoints be deleted by some other means.

If the slow worker does eventually loop back and ack its batch then we remove that page and checkpoint and clean the intermediate checkpoint files.


#13

Thanks @guyboertje for the detailed description. But i have been observing the behavior of the queues by creating a small script which gives out the number of events processed by the pipeline. Whenever the persistent queues are enabled, and if there is a sudden spike in the incoming logs (around 1 million in 5 minutes), the pipeline stops processing the events(which is expected according to documentation). But the problem is that when the input events reduce the pipeline is not getting back up to process the events and the logs in redis keep on increasing.


The above image shows the sudden spike in output events to elasticsearch and after that the pipeline does not start processing the logs from the redis queue.
I see this behaviour only when persisted queues are enabled.

Above image is the behaviour when the persistent queues are disabled.


(Guy Boertje) #14

@ash007

This seems to be a bug. Please file a bug report at https://github.com/elastic/logstash/issues

Make sure you give us extremely detailed instructions of what you did, your versions, configs and sample data.


(system) closed #15

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.