I am moving my logs from an on-prem instance, up to Elastic Cloud. I have setup a logstash pipeline, with a very large persistent queue, like below to push events up to the Cloud.
Logstash pulls the events very quickly at first, but then slows down to a trickle. Here's what the elasticsearch input looks like, it ingests events for about 4-5 hours.
Any ideas why this happens or anything I can do shorten this....slope?
Logstash pipelines work based on 1*x + n threads; x is the amount of input{} configured in the pipeline and n is the amount of workers assigned to your pipeline (pipeline.yml). n is assigned the filter {} output {} portion of your pipeline.
IO is expensive and ususally slow, this is why splitting up inputs can benefit your processing speed as they will work in parralel. The workers then handle both inputs in the pipeline with the same filter and output.
Output again being IO (network traffic) this is slow, so when one call to cloud is happening and waiting for a response, another worker can do it again, creating a parrallel processing stream.
play around with multiple inputs (x) and the amount of workers assigned to the pipeline (n) to see if you can improve the throughput. Keeping in mind the CPU size assigned to the node handling the pipeline ofcourse.
Fantastic explanation! You started with the math and my brain went 'oh no...' and then you used words and it was all like 'awww yaaa'.
I have one additional input that grabs events sent from the Elastic Agent. My filter section consists of a couple basic mutate filters to put some information at the beginning of each message field and then I parse out the message field using a kv filter. From there I use a few filters to add my custom fields and then finally delete the parsed out fields. The parsed fields are stuck into an object so I just reference [temp] to delete the temp fields.
I'm running a single pipeline on a VM with 8 vCPUs assigned to it. Looks like under full load, Logstash tops out around 45% CPU usage.
OpenSearch/OpenDistro are AWS run products and differ from the original Elasticsearch and Kibana products that Elastic builds and maintains. You may need to contact them directly for further assistance.
(This is an automated response from your friendly Elastic bot. Please report this post if you have any suggestions or concerns )
Thanks for that, I read through it and, while it does explain why this tailing off occurs, it doesn't look like I can do much. While I have many indices to pull data from, I am doing a single index at a time and each index only has a single shard. That would seem to rule out configuring slicing as the documentation states configuring more slices than shards can be detrimental. It seems like my only option may be to change the page size.
The default page size is 1000. I wonder, if the tailing off of performance is because of page depth, if I set a page size of 2000, would that cause the performance tail off to take twice as long, since it's taking twice as long to get to a given page size or am I misunderstanding something?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.