I'm doing an internship at a hosting company and I got a question about what gives us the best performance. The endgoal is to pipe all apache and php logfiles into Logstash.
The options are:
Push all apache files to one file (localy) and read it from there with Filebeat.
Let Filebeat check every location (150+) and push them to Logstash.
Is there any difference?
Additional question:
Is there any advantage if I use a module like "apache module" on Filebeat to send the the logs to Logstash, or do I send them directly to Elasticsearch?
Filebeat modules offload parsing to Ingest Node in Elasticsearch. If you want to do your own processing in Logstash, you should better use inputs directly.
150+ files is a many files. You should run some tests though. Too many files in one directory can be a problem, but 150 files is not too many yet I think. In the end it depends on a few factors like OS, filesystem, actual log write patterns over time.
Push all apache files to one file (localy) and read it from there with Filebeat
It's an option, but common pitfalls with this practice:
Combined logs should have same format, so to simplify processing/filtering needs
Do not intermix multiline logs in one file. You might not be able to reconstructs those multiline events properly.
I guess the later is more about stack traces from your php logs.
My advice is to always test and get an idea how the different options perform. Do not optimise/tune if the system already operates/performs up to expectations. The simpler the setup, the less you have to modify, the better.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.