I'm trying to use the ElasticSearch community edition for the following scenario.
I have 20,000+ devices working on thousands of geographically distributed offices. Every office has its own private network. I want to collect all the logs from all the devices into a central ElasticSearch cluster which is deployed at the head office. In order to do this, I evaluated several options.
- Deploy Filebeats on each device and configure it to report its logs into central ElasticSearch Logstash deployed in the head office.
- Deploy Logstash at every office and each Logstash will write logs into central ElasticSearch at the head office.
- Develop a custom aggregator beat (using beats lib) which will be deployed at each office. Filebeats on devices will report logs to office level aggregator which forwards logs to Logstash+ElasticSearch at the head office.
- Guaranteed delivery of logs to the central system
- Higher acceptance (write) throughput at the central repository
- Control the data flow (throttling/compression/bulk push) from edges to the central system.
- Local network of the offices must not be exhausted when central Logstash is inaaccessible by the retries of Filebeats.
- High Availability
Which option would be recommended for the above scenario?
Are there any other better options?
Thank you in Advance!