I am really not sure if this is a logstash issue or an elasticsearch issue.
I have a logstash set up with 3 pipelines each with their own filebeat input, each with logs coming from a different website, logstash then does its thing with a bunch of filters, and then I output into one of the following indexes [site]-http-%{[@metadata][mode]}-%{+YYYY.MM.dd} or failures-%{+xxxx.ww}. Where mode is user or bot, and if there are any filtering errors the log goes to failures. So basically each pipeline writes to ~3 indexes.
When only one pipeline is being fed data by filebeat everything is fine and I can get 2000-3000 event per second ingested. But when I turn more then one on then my overall ingestion rate drops to around 500-600 events per second.
This level is fine for me to keep up with my normal traffic. But makes it very hard for me to pull in old data or clean up issues when certain things fall apart.
I have looked at a lot of the performance documentation and recommendations, I tried explicitly setting the number of workers per pipline but that does not seem to have had any effect. I have also tried a bunch of other things to try and narrow down my issue but could really use some outside guidance.
Which version of the stack are you using? What type of storage are you using? How many shards does each index have? have you looked at I/O statistics during indexing so see if disk performance might be the bottleneck? Have you tried increasing the internal Logstash batch size to achieve larger bulk requests?
6.4.0 of elasticsearch and logstash.
Old mechanical spinning disks in a software RAID 6 array. I know this is not ideal, but I do not have a lot of machines to dedicate to my ELK stack, and was just trying to reuse retired 4 TB drives that were replaced by 8TB on some of our unrelated content servers.
So you can see here from you can see my event latency spike up around 8:28. That is when I tried to add another filebeat instance to feed in some missing log files from yesterday. At 8:34 is when I turned off the other 3 filebeat instances so that only the one filling in the missing data was running.
The IO is high... but I didn't think one index writing 1000 e/s VS 10 indexes writing 100 e/s would have the different IO implications.
I will try and increase the logstash batch size, and see if that helps.
Writing to a lot of shards can result in a lot of small writes in many different places, which can be a problem for slow storage. If you are on Linux, use iostat -x to get a view of I/O load as well as await during indexing and see if this changes when you add additional pipelines.
One additional question: Are you allowing Elasticsearch to automatically assign document IDs?
Okay yeah thanks I think I am getting closer. I added pipeline.batch.size and pipeline.batch.delay and pipeline.workers for each of my workers and it seems much better.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.