We use elasticsearch for storing logs in our infrastructure. The entire infrastructure is composed of multiple workers that write the logs to a central elasticsearch server. The number of workers can increase as the rate increases of amount of data. I was wondering what would be an optimal configuration for writing data to elasticsearch with filebeats? Should each worker get it's own log writer that uses filebeats or should i use some third party mechanism to ship the logs to somewhere from where they could be indexed to elastic?
Having a single filebeat instance on each node, that reads all the different logfiles sounds like a good plan to me.
I do not fully understand the difference between worker and writer and their relationship to a filebeat (or a node). Maybe you can explain.
From an architectural perspective, there are a couple of ways you can go. Have the beats write directly to Elasticsearch. For filebeat this is fine, as even in case of outages a filebeat can just remember where it stopped reading data in a file and go from there. If you run other beats that collect statistics they can spool some data to disk, but will ultimately drop data if Elasticsearch is not available - you could another component in between that collects data from several beats and persists it temporarily on disk, like logstash with a persistent queue or kafka (which would also allow to not be reliant on the indexing speed of elasticsearch in case of a surge of logs, think DoS). But as with everything there comes the cost of maintaining it, so I would always start small and then grow over time.
Hope this helps!
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.