ELK Stack for 1TB logs per day from multiple services

We are trying to set up an ELK stack for two of our services where each service dumps about 1TB logs per day. The logs are well structured. There will hardly be any expensive groking or filtering in logstash servers. There will be no multiline logs as well. In coming future we will add more services which dumps about the same amount of logs.

There are few administrators who will monitor the data using Kibana, so I am guessing there will be 1 or 2 queries per second. Also there is a requirement of querying the logs of both the services together (some aggregation they might do )

We want to maintain logs for 3 days for each service. Can some one suggest me the best possible deployment architecture (like separate indices for each service, no. of shards, replicas. logstash-forwader or use some other shippers, broker required). The architecture should be horizontally scalable.

What should be the hardware requirement for various machines (logstash, elastic cluster, Kibana)?

Hi Deb, these are great questions, but there is no one-size-fits-all deployment guide to answer them. But you are asking the right questions :slight_smile:

Most of the hardware sizing questions are exploratory -- you need to perform tests on your hardware and work backwards.

For ES I recommend reading this section in the book "Definitive Guide". See https://www.elastic.co/guide/en/elasticsearch/guide/current/scale.html

To figure out the # of shards, # of indices, and # of replicas, you would have to do a sizing exercise mentioned here: https://www.elastic.co/guide/en/elasticsearch/guide/current/capacity-planning.html. This will give you insights which can extrapolated for your data requirements.

For LS, I recommend reading this: https://www.elastic.co/guide/en/logstash/current/deploying-and-scaling.html
We describe different architectures based on your scale.

Both LS and ES are designed to be horizontally scalable. I recommend starting small and scale out as your needs grow. I would also recommend using time based indices (since it is all logs) which will allow you to increase your shards as your needs grow.

Lastly, check out Curator tool to manage your logs based on retention.