What would you think it will be a good ELK architecture?
(needs described below)
Hi
We have several applications hosted on different URLs (eg: app1.foo,com, app2.foo.com) even some of them sharing domains but on different URLs (eg: apps.foo.com/app1, app.foo.com/app2)
Each application generates several log files (eg: apache, tomcat)
All logs are indexed on ES, using indexes named app:file-timestamp (eg: "app1:apache-2016-07-15")
Events come from different sources (eg: syslog, filebeat)
ALL elements of the architecture are dockerized (I really love docker...if you don't, you should!)
It is desirable to have a zero-loss processing pipeline, although we know Logstash is still not 100% reliable for this, so we can live without it.
After reading Deploying and Scaling Logstash and with those needs clear, we started playing.
We first started doing a HUGE Logstash pipeline (many many many filter files, around 900). Using docker we scale as needed.
This seems to be great squeezing host resources, but it's quite ugly to mantain.
On our second try, we made one logstash pipeline for each application, but this approach has other problems: resource waste, too many containers for these humble needs (around 300)
On our third try, we are trying to group applications by department, having one pipeline for each...but to be honest, it doesn't make any sense sometimes.
Even we have though a possible fourth scenario, where logs are indexed raw in ES and some LS workers postprocess, reindexing them in the final index and removing the raw version.
For all this, we are using REDIS as a pipeline connector, although I don't feel very confident on the persistence it offers, and I'm also missing a retention feature like Kafka. At the end, less is more, and we will probably remove it from the architecture.
So....what do you guys think about this? Any advice? Any happy idea?
Thanks in advance
Regards