We are trying to set up an ELK stack for two of our services where each service dumps about 1TB logs per day. The logs are well structured. There will hardly be any expensive groking or filtering in logstash servers. There will be no multiline logs as well. In coming future we will add more services which dumps about the same amount of logs.
There are few administrators who will monitor the data using Kibana, so I am guessing there will be 1 or 2 queries per second. Also there is a requirement of querying the logs of both the services together (some aggregation they might do )
We want to maintain logs for 3 days for each service. Can some one suggest me the best possible deployment architecture (like separate indices for each service, no. of shards, replicas. logstash-forwader or use some other shippers, broker required). The architecture should be horizontally scalable.
What should be the hardware requirement for various machines (logstash, elastic cluster, Kibana)?
Both LS and ES are designed to be horizontally scalable. I recommend starting small and scale out as your needs grow. I would also recommend using time based indices (since it is all logs) which will allow you to increase your shards as your needs grow.
Lastly, check out Curator tool to manage your logs based on retention.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.