I am ingesting logs from an on-prem logstash to Elastic Cloud. My Logstash instance has persistent queue enabled. I ingested a large set of data, about 50 million events from my on-prem Elasticsearch instance using the Elasticsearch input, and saw the Logstash queue fill and then empty. My Logstash queue has been empty for five hours now and when I search for the logs that I ingested, the number of events on-prem do not match what's in the cloud. When I refresh the cloud instance, the number is slowly going up. Is there some kind of queue or cache of some sort in Elastic Cloud?
How do you check/verify this? Are you using the cat indices API?
No, there is no built in queue. Depending on how you are verifying the document count, one reason for data to show up over time is that the timesatmp set for indexed data is incorrect. All timestamps indexed into Elasticsearch are in UTC timezone so ingesting local timezone timestamps without timezone specified may make data appear to show up over time as they future dated.
It would help if you could share a document that showed up late as well as your Logstash pipeline.
I configured logstash to pull all documents in a specific index and configured the input to also add the old index name to a new field in each event. I then pull up Kibana's Discover and filter on-prem and Elastic Cloud by that index name to get the total number.
I'm checking my Logstash persistent queue using Kibana's monitoring page. However, I just looked at the pipeline metrics and it shows it's still pulling in documents on the Elasticsearch input...at a significantly slower rate. Initial ingest rate was about 4,100 events/second but it tailed off to about 900 events/second....wonder why it's doing that...
Are you specifying the document ID in your Elasticsearch output? If you do, each insert will be treated as a potential update, which often slows down the indexing throughput as the dstimation index size grows.
At the moment, I am not.
I managed to flatten the curve and increase throughput to where my cloud instance is now the bottleneck. I currently have 8 vCPU assigned to the VM with 32 GB of RAM. I upped
pipeline.workers to 32, set the Elasticsearch
size to 7500, and
scroll to 8m.
Initial ingest hits 9,400 e/s with a final rate of 3,400 e/s. 2.29 and 3.7 times higher respectively...can't complain about those gains. 45 million documents processed by Logstash in about 2 hours.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.