I have been using Logstash for over 2 years and love it.
One frustration I have had is scaling Logstash across multiple (100+) different data types. (These are not like dissimilar syslog types but completely different data type ingestions.
- Logstash configuration (Even when split across multiple files) becomes unwieldy.
- My current config is nearing 10,000 lines
- Data type bleed-over between data types can easily occur due to mislabeling data set code.
- Utilizing Kafka topics and multiple separate Logstash pipelines doesn't scale well. Need auto-scaling due to current data set.
My current solution is a mix between multiple Kafka Topics and mixing multiple datasets in the same config.
Here is what I want to do (written a little code for this but distracted by other ideas as well) and curious if anyone else in the community has already attempted this where maybe I can tie in to help.
A Logstash / Docker configuration runner that allows auto scaling.
Basically, the following...
- Utilize a "Logstash Module" for different data sets.
- A module would contain all configuration, dictionaries, grok expressions for a single data set. A plugin manifest included to show necessary logstash plugins to run the config
- A REST service that takes a JSON object specifying data input and output methods, modules for config, and environment variables
- A config generator based on requested Logstash modules
- A service that tests generated configs from the modules.
- A docker orchestrator that spins up a single or multiple Logstash containers with necessary plugins and applies the generated config from the generator. This will allow me to autoscale Logstash containers based on queue in Kafka for instance.
Is there any similar related Logstash Projects?