We are in the process of establishing Elastic Search on enterprise scale which will span servers spread across two primary data centers located in US and Europe. We are using 5.5.x version and our goal is to have a single view of enterprise data spread across both DCs.
I am trying to ascertain the right approach to go about this.
Will a single cluster spanning DCs be feasible ?
If not, what kind of replication scheme be used for replicating data between the clusters located in each DC ?
Do you recommend using Kafka as a queuing mechanism or use logstash's own persistent queues instead ?
Requirements:
Our primary objective here is to have an elastic search environment that can aggregate various log files from all servers in both data centers (US and EU).
We are looking for a unified view of all the collected log messages (via kibana or similar interface).
The search and indexing latency must be minimal.
I looked at this page and felt "Independent Elasticsearch and Kafka Clusters" option suited our requirements better.
Questions:
For our requirements, does having a dedicated Elastic Search cluster and kafka instance in each data center be appropriate ? If so, can I use a simple logstash service to replicate data from remote DC ?
That page was over 2 years old and with elastic search having added more features now, I wanted a fresh perspective on the architecture discussed there. With the features available in 5.5.x, is the architecture discussed in the page still viable ?
(1) Client Nodes
(2) Logstash Node (the logs generated by all clients in this DC is processed by this logstash node)
Europe DC:
(1) Client Nodes
(2) Logstash Nodes (the logs generated by all clients in this DC is processed by this logstash node)
(3) Elastic Search Server
(4) Elastic Search Database
(5) Kibana
The data collected by logstash in America DC will be transferred to the Elastic Search Server in Europe DC.
Also, the elastic search volume is expected to expand rapidly this year,
Do you foresee any problems with this architecture that would affect elastic search performance or is it sustainable ?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.