I am new to using ELK and I am planning to stream logs from multiple application servers into S3 and then make logstash read from there. So, I need some advise on how that architecture could look like. I have gone through some documentation and resources on internet here, here, here, here, here, here. But, with S3 I am still confused which architecture to go for.
If I were to use File beats on app servers, my architecture could look like this.
App server (file beats)--> Queue (Redis) --> Logstash --> Elasticsearch --> Kibana.
But, since I have a use case where logs are already being streamed to S3, should I really use a queuing system like Redis? I mean, since its the Logstash which pulls the data from S3, it knows when to pull and not pull data and so no log event is missed. Let me know if this understanding of mine is wrong.
So, I am planning to start a production setup with an architecture like this:
S3 (multiple directories) --> Logstash (1 Server) --> Elasticsearch (3 nodes) --> Kibana (1 Server with elastic search client)
I have following questions regarding this architecture:
- To get better latency (time between an event logging on app server and event being indexed on ES), which is safer option to go? S3 or Filebeats?
- If I go for S3 (even if slow), do I require queuing system? If I do require, can I put it on same server as Logstash?
- If I use only 1 Logstash instance, is it susceptible to not being a real-time log analysis system? Should I use 2 Logstash instances, so that even if one instance goes down, I can still read logs on another logstash instance from S3?
- If I use 2 logstash instances should I use 2 queues or 1 queue would be alright and gives me high availability?
- For a load (25 GB/day and with replication 50 GB/day; Retention=60 days), would it be OK if I start the cluster with 3 nodes? i.e, 2 serving as Master+Data nodes and 1 as dedicated master. Do I require 5 nodes (3 dedicated masters, and 2 data nodes)?