Suggestions - Hardware and software configurations of Logstash, Elasticsearch, Kibana

what we want to setup is :

  • Kafka server => Kafka-logstash-input plugin => Elasticsearch => Kibana

  • Use => Real-time analysis

  • Analysis frequency => last 1 week data

Below are the our Kafka message system details :

  • Message rate for kafka server => 30000 records / seconds
  • Message format => json
  • Mesage Size => 5Kb / record
  • Expected daily data size => 500 GB

Could you please give us some hardware and configuration suggestions?. here are the some:

  • AWS instances hardware configurations.
  • Memory (RAM) and CPU Tuning configurations for all software's like Kafka-logstash-input plugin => Elasticsearch => Kibana
  • Hard Disk requirements.
  • Data compression Methods.
  • Other configurations like sharding, replicas etc..

With elasticsearch its best to experiment with hardware as requirements may vary with what type of content you are indexing, number of fields, query rate, shards, replicas.

For me - Currently using 20 nodes for 350GB per day of log messages with over 1000 different fields (each message contains upto 20-30 fields). We store data for the past 90 days but only have open indices for past 10 days.
each node is a physical box with 64gb RAM, 2x3TB RAID 0 disks, 12 cores.
number of shards - 20
number of replicas - 3

We dont use kafka - but use logstash-forwarder and logstash for receiving and processing logs.

Would be interesting to hear other setups of similar or larger scale.