Tailf new file system( not constant) to logstash


Using python infrastructure:
I am already familiar of putting data into logstash via python module " logstash_async". It works excellent. My need now as part of test process on several nodes ( machines) is to give an ability to users who decide to write system machine logs such as "dmesg" by "tail f" these files for a period of time and push the output into logstash. Is there a way to do that ?

Hey @Dawood,

Did you try with Filebeat? It can be used exactly for that, it can watch files and send all their contents to Logstash or Elasticsearch. It also has modules to parse well-known log formats.

Thanks Jaime. these files are dynamic. they can be of hundreds of file systems machines. and there is no need to listen to them. Only when a user decide to tailf on some of them then i want to give the ability to him to redirect this tailf to logstash. I know that i should configure logstash as well. Hopefully i am more clear now.

How do users notify the system that they want to tail -f a file to logstash?

Motivation: write a class or a module in infrastructure python system that will use filebeat or python async module to post any tailf log of any file in any system from the users python scripts that use my infra python. As an example they get a real error/failure on a machine and then they want to read the dmesg of that machine and send the log to logstash.

Hey @Dawood,

Observability solutions use to work in a different way. They collect the data as soon as it is available in the edge systems and send them to a central point. With beats you can send this data (logs, metrics, traces...) easily to Elasticsearch. Once in Elasticsearch they can be visualized or analyzed in any way. This has the single disadvantage that you have all the data in Elasticsearch, but this has many advantages:

  • When having an infrastructure with many systems It is a good practice to send the logs out of the machines as soon as possible, so if the machine is lost or compromised in any way the logs are still available in a central point of trust.
  • Previous point is double important if you use containers or virtual machines. When you delete them you may lose their logs if you haven't send them to a central point.
  • You don't have to worry about keeping your logs in your systems for a long time so they can be later read.
  • Having all the data in a central point allows you to make queries and aggregations much faster than having to read them system by system. Imagine for example in your use case that your users want to have a visualization to view on how many systems an error is happening, or with what frecuency, this is easy to do if you can search along all the logs of all your machines from a central point. But is complex to do if you have to query system by system and then somehow aggregate the data.
  • Logs lifecycle can be decided from a central point. E.g. you can decide for how long you want to keep your data. If you want to keep logs for a longer time, you don't depend on the capacity of each one of your systems, you can scale Elasticsearch as you go.
  • You can extract information as timestamps, hostnames, requests done... from your logs.

If you want to give a try to this approach you only need to install Filebeat in some of your systems, configured with Elasticsearch as its output. Then you can use for example the Kibana Logs UI to visualize the logs from a centra point. Take a look to https://www.elastic.co/log-monitoring

Thanks alot for your elaboration. Now it is more clear to me, and i understand now that i should install filebeat on each system that should read its log. right ?

Yes, in my opinion this would be the way to go if possible.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.