Elk sizing architecture

Hi ,

for moving to the production environment , I am planing to have the below architecture for elk.
I am getting around 5 GB data/day and I am keeping the data for 60 days , after 60 days I am taking the backup of data and deleting the old data using curator
I am having one index/day and I have 30 applications , so total 1800 indices .

I do not want to go for curator because for using curator I will have to create huge number of indices .
So
(if not using curator I will use delete by query :-

{"query":
{"range":{
"@timestamp":{
"lt":"now-60d"}
}}
})

Hardware architecture :-1

Total server = 8

Dedicated master node :- 1 (RAM: 64 GB , storage : 100 GB)
Mater + data node :- 2 (RAM:- 64 GB , storage : 1 TB)
Dedicated Data node:- 1 (RAM : -64 GB , storage : 1TB)
client Node + Kibana(backup) :- 1 (RAM : -64 GB , storage : 1TB)
Dedicated Kibana :- 1 (RAM: 64 GB , storage : 100 GB)
Servers to run Logstash :- 2 (RAM:- 64 GB , storage : 100gb)

Do you see any flaws in above architecture ?
Pls let me know how can I change this even better ?

Thanks

1 Like

I am having one index/day and I have 30 applications , so total 1800 indices .

1800 indexes is quite a lot given the per-index overhead. Why do you think it's necessary to have one index per application?

I do not want to go for curator because for using curator I will have to create huge number of indices .

I don't get why Curator would require an increase in the number of indexes. Given what you say in a prior paragraph, "I am ... deleting the old data using curator". So, are you planning to use Curator or not?

Dedicated Kibana :- 1 (RAM: 64 GB , storage : 100 GB)
Servers to run Logstash :- 2 (RAM:- 64 GB , storage : 100gb)

You know you don't really need 64 GB RAM for either Kibana or Logstash, right?

1 Like

I'm in complete agreement with everything @magnusbaeck has said, and will add to it with relation to delete by query.

Delete by query is not a good idea as it puts an enormous I/O strain on your cluster. Every single doc must be first found by query, then flagged for deletion, and then purged at the next segment merge—which will be painful as a very large number of segments will need to be merged to compensate for the deletes. This is why deleting daily indices by way of Curator is preferred.

Are you saying you don't want to use Curator because you want to have different retention periods for all of your different applications, and that's why you need so many indices? Are you trying to segment access by index name? You'd be far better off indexing content by retention period than what you've described.

Is the above hardware architecture is good enough to roll-out to production ? considering your advice that I will reduce the kibana and logstash size to 32 gb.

([quote="magnusbaeck, post:2, topic:38510"]
I don't get why Curator would require an increase in the number of indexes
[/quote]2)
Curator can not delete the part of indices , so If all my document are in one index , I can not delete only those who are 60 days old. so I have to create index on per day basis.

This is still a good idea: one index per day. Please help me understand why you need one index per application. This is less than ideal.

If you only have 5GB per index, reduce the shard count from the default 5 to 1 (with 1 replica).

Is the above hardware architecture is good enough to roll-out to production ?

For 5 GB data/day over 60 days it should be sufficient, but I'm not convinced that your index and shard configuration is optimal.

considering your advice that I will reduce the kibana and logstash size to 32 gb

The default heap size of Logstash is 1 or 2 GB (and unless there a memory leak bug that's not a limit people tend to have to increase) so for a Logstash you'd be fine with a 4 GB server. I'm not familiar with the RAM characteristics of Kibana but it should need much considering that it's really just an HTTP server and proxy against ES.

KB needs bugger all too, a few gig should be heaps.

Will 4GB RAM would be sufficient to run 20-30 conf file at a time ?

I need one index/application because each application is totally different from each other .
and second thing is it is easy to manage the index . If every data would go to one index it would be hard to manage .. isn't ?

That would be good Idea..
but I am very sure how it will effect the kibana's performance or other search queries .Because now for a application we have 60 index(1 shard/index) , so any query will going to hit 60 indices for getting the data .
Will it impact the performance of kibana ?

Use weekly indices then maybe.

Also LS will mostly use CPU, so make sure you have that on the node.

Will 4GB RAM would be sufficient to run 20-30 conf file at a time ?

The number of configuration files is irrelevant in itself. The memory footprint of Logstash should be largely constant unless you're using plugins with "memory", like multiline, aggregate, and the like, i.e. plugins that save some kind of state between events. And even then you'll only need significant amounts of memory if it needs to store significant amounts of information, e.g. track thousands of multiline streams.

and second thing is it is easy to manage the index . If every data would go to one index it would be hard to manage .. isn't ?

Why? I can only see two reasons for having multiple index series:

  • You have a multi-tenant environment and need to segregate the data for access control purposes.
  • ES's restriction of mapping a particular fields in the same way across all types in the index isn't acceptable.

Servers to run Logstash :- 2 (RAM:- 64 GB , storage : 100gb)

Are you going to have a load balancer in front of the machines, will you use round robin DNS, or will you configure log clients to connect to either server?

It's not clear if 100 GB is for the whole machine (operating system and all), but you know that Logstash is very lean on disk, right?