I have this elk stack running, 8 datanodes, 4 ingestnodes, 4 logstashes en a shitload of applications that send data from multiple servers via filebeat. The size of these applications vary from 4 servers to 24 on this location (the other half of the apps are on another location and send data to their own elk stack). My logstashes run 4 pipelines each, ports 5044 to 5047. The busy apps have their own pipeline whileas the smaller ones share a pipeline.
The biggest and most important index receives around 531,746,569 documents a day with more pressure late in the afternoon and the evening and less pressure in night and morning.
Now sometimes there are peakmoments in which too much data comes in at once. Or sometimes a network or storage event occurs what causes a hiccup. The stack then needs some time to catch up again, but the load of docuemtns that is being send is too high to do this quickly. So in the end I always have a gap of non-recoverable events.
I would like to prevent this. Now I learned people use redis to solve this. But what would be a decent setup to do this?
Should I set a redis in front of each logstash? (Am going to use redis at first only for the pipeline on 5044). So that this means: filebeat--> 1 of 4 redis --> 1 of 4 logstash --> elasticsearch
Should I put Redis Behind logstash? filebeat--> 1 of 4 logstash --> 1 of 4 redis --> elasticsearch ?
Maybe even both? filebeat--> 1 of 4 logstash --> 1 of 4 logstash --> 1 of 4 redis --> elasticsearch ?
Is 4 redis overkill? Or undersized?
And since I understood Redis is a in-memory DB, what would be an appropriate size of Ram?
Are there other (better) options than Redis to consider?
Thnx for your insights in advance