I have other pipelines running as well, but this pcap-gy has made the challenge for me. Docs of 2 days before couldn't be indexed yet. As I checked on average each 35 seconds only around 200K json docs is being processed, and just for test I have changed the output to /dev/null this average changed to 30 secs, so elasticsearch might not be the bottleneck.
BTW playing with the batch.size did not help me, should I keep 4096?
Well, you would need to share them as well, it is hard to troubleshoot something without the full context, until now we assumed that you had only one pipeline running on your logstash instance.
The pipelines are being executed in the same instance, everything can impact the performance.
One thing that I don't think was mentioned, what is the Logstash version you are using? Please share this information as well.
Yeah, this can be an issue, the file input is single threaded if I'm not wrong, and from my experience Logstash does not performs well when you have a huge amount of files in a path.
You should change relatime to noatime as relatime can impact performance and also decrease the life of your SSD disk, you can read more about it here.
This can also have some impact in performance, data=ordered, it would be best to use data=writeback, but I'm not sure if you can just change this in the mounting options.
You can read about the differences here on the Data Mode part.
Were you able to make a comparison on another logstash server, with output set to /dev/null, to get a feel for what logstash on a single server can do ith your data and your logstash configuration.
For iostat, I meant running it for an extended period, when the logstash importing is ongoing, outputting say every 10seconds, for a period of say 10 minutes, looking specifically at device sdd. "iostat -x 10".,
My concern here is you/we are not getting much closer to understanding if eladsticsearch or logstash or the storage is the bottleneck.
My hunch remains you would be better served by scaling things horizontally. I say that having never in my life ran any elastic tool, I did use for massive oracle or the SQL databases, on a system with such a lot of RAM.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.