Try to figure out what kind of hardware I need

Hi,

I have a project to deploy ELK to manage all OS logs (GNU/Linux , Windows) and network equipments logs.

I need to be able to manage :

  • 50 Gigs each days
  • 750 hosts
  • 4500 logs / seconds

That's for today , of course my network will grow ...

I try to figure out what kind of hardware I need to manage this .
I read those articles :

At the beginning I will setup every things on one box , I know it's not the best but that's my budget : ( , and I need to demonstrate ELK is the good solutions.

So I will buy a system with 64 Gig RAM and a CPU with many core , less speed but more concurrency process...
But for the hard drives, it's hard to know what I must choose.
I want keep:

  • 1 mouth quickly available
  • 3 mouths available throw the web interface but not instantly , but without hard manual operation.
  • 6 mouths available, I think I will store information in flat files and people can grep in it ... but if you have an other solution : ) .

In this article https://www.elastic.co/blog/elasticsearch-storage-the-true-story the guy explain the index expansion factor is between 0.553 and 1.399. So for 1 mouth . 50 Gig * 30 days = 1500 Gig * 1.399 (index expansion worse case) = 2098,5 Gig . 2 tera in RAID 10 with 15K HD or 10K HD , that's for the current mouth .

Now how can I manage the older logs 3 mouths and 6 mouths ? The documentation https://www.elastic.co/guide/en/elasticsearch/guide/current/hardware.html say avoid network-attached storage (NAS), so do you have a suggestion to store all these data ?

If have keep the old 6 mouth in flat file I thinks I can store it on my NAS but for the 3 mouths I have no idea ?
My goal is to use hardrive slower , and cheaper for logs 3 mouths old , those logs will be rarely access so it can take more time to access .... any suggestion ?

thanks for your help , I hope I explain clearly my goal and my problem .

This tiered requirement is why having multiple nodes is good, then you can have different nodes with different hardware and "archive" data across them; This is known as hot/(warm|cold) architecture.

As for sizing, if you know the daily index rate and your retention period, it should be easily to figure out the capacity you need.