Basic architecture for global rollout?

I do have a single instance of ElasticSearch 5.x up and running and did create in 4 weeks about 30 Million documents. The performance is great and I was able to evaluate everything that is needed. But it is difficult to tell how many documents will be created after the global rollout.

I now need to setup a starting infrastructure to rollout the ElasticSearch globally in the whole company.

We do have 9 offices in 4 countries and are all connected via VPN connections.

We want a central monitoring (dashboards) of the major IT systems on all sites. The question is now how to setup a good infrastructure that we can scale out if necessary. I can't find any information about a good design.

Central cluster and all data nodes in one data center ? Or can I split this over different sites ?

Redis installed on each each data source and in addition in front of the ElasticSearch cluster ? I need to cover the case that a site gets disconnected. Of course I don't want to loose the information that gets produced during an outage.

I'm a little bit lost and don't really know where to start.

Any advice or recommendation ?

Kind regards,
Thorsten

Setting up Elasticsearch clusters spanning data centres is not supported, so the most common solution in this type of cases is to set up a central cluster in one location. If all your inputs are file based, Filebeat is usually able to handle network connectivity issues well, without losing any data, as it simply can stop reading files until the error has disappeared. If you have non file based inputs or a very aggressive rollover policy for your logs, you may need to introduce a buffering mechanism in each site. The latest version of Logstash introduced a disk-based persistent queue, but you can also use some kind of message queue.

When it comes to efficiently transfer data from the various sites to the central location, communication between Filebeat and Logstash as well as between two Logstash instances can be encrypted and compressed in order to make this as efficient as possible. You cam typically also alter the batch size to suit your use case.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.