Elastic Cloud Persistent Queue?

wwalker · October 16, 2023, 6:03am

I am ingesting logs from an on-prem logstash to Elastic Cloud. My Logstash instance has persistent queue enabled. I ingested a large set of data, about 50 million events from my on-prem Elasticsearch instance using the Elasticsearch input, and saw the Logstash queue fill and then empty. My Logstash queue has been empty for five hours now and when I search for the logs that I ingested, the number of events on-prem do not match what's in the cloud. When I refresh the cloud instance, the number is slowly going up. Is there some kind of queue or cache of some sort in Elastic Cloud?

Christian_Dahlqvist · October 16, 2023, 6:21am

How do you check/verify this? Are you using the cat indices API?

No, there is no built in queue. Depending on how you are verifying the document count, one reason for data to show up over time is that the timesatmp set for indexed data is incorrect. All timestamps indexed into Elasticsearch are in UTC timezone so ingesting local timezone timestamps without timezone specified may make data appear to show up over time as they future dated.

It would help if you could share a document that showed up late as well as your Logstash pipeline.

wwalker · October 16, 2023, 6:34am

I configured logstash to pull all documents in a specific index and configured the input to also add the old index name to a new field in each event. I then pull up Kibana's Discover and filter on-prem and Elastic Cloud by that index name to get the total number.

I'm checking my Logstash persistent queue using Kibana's monitoring page. However, I just looked at the pipeline metrics and it shows it's still pulling in documents on the Elasticsearch input...at a significantly slower rate. Initial ingest rate was about 4,100 events/second but it tailed off to about 900 events/second....wonder why it's doing that...

Christian_Dahlqvist · October 16, 2023, 7:13am

Are you specifying the document ID in your Elasticsearch output? If you do, each insert will be treated as a potential update, which often slows down the indexing throughput as the dstimation index size grows.

wwalker · October 16, 2023, 1:21pm

At the moment, I am not.

wwalker · October 23, 2023, 4:00pm

I managed to flatten the curve and increase throughput to where my cloud instance is now the bottleneck. I currently have 8 vCPU assigned to the VM with 32 GB of RAM. I upped pipeline.workers to 32, set the Elasticsearch size to 7500, and scroll to 8m.

Initial ingest hits 9,400 e/s with a final rate of 3,400 e/s. 2.29 and 3.7 times higher respectively...can't complain about those gains. 45 million documents processed by Logstash in about 2 hours.

system · November 20, 2023, 4:01pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Regarding Logstash Persistent Queus Logstash	1	259	June 22, 2018
Logstash persistent queue empty but all events coming in fine Logstash	6	1146	April 27, 2021
Logstash: Persistent Queue Behaviour Logstash	4	767	February 22, 2021
Logstash Data Persistency Logstash	1	251	March 11, 2021
Persisted queue never clearing Logstash	1	2334	March 2, 2017

Elastic Cloud Persistent Queue?

Related topics