Have kibana get input from 2 elasticsearch clusters with same indexes

Hi,

After a successful poc, We want to clusterize our elasticsearch and make ik georedundant. The servers we are getting info from are divided over 2 locations. What I would like to achieve is that , just like now, all servers are visible as 1 index in kibana. But the ES backend should consist of 2 clusters. Both cluster is setup exactly the same and indexing stuff from it's own location. This way, when a calamity occurs, the other location will continue to run, but also the datagathering and monitoring for that location still is functioning, which is important than more then ever because all load will concentrate on 1 location instead of 2.

My idea was to have a loadbalancer setup which connects users to 1 of the kibana's (each location has 1). Both kibana's should be able to display data from both locations. I read something about cross cluster search which sounds promising, but I am not sure this does exactly what I need?

I think it is not difficult (correct me if I am wrong) to get everything in deifferent indexes, but can I make kibana display our graph's and data from both locations (just like now while I have just 1 kibsana and 1 ES server)

Hope you get what I mean.

Cheers, Eric

Hi Eric,

This is a great question. Do your two Elasticsearch instances have to be in separate clusters? Are they in different data centers? Different regions? Just asking in case there's an option to combine them into a single cluster, but I gather that isn't the case. Especially since you're using the separate clusters for a disaster recovery scenario.

Do your two Elasticsearch clusters currently replicate data between them at all? You said "same indexes" and I'm not sure if you only mean same index names or actually all the same data as well. For disaster recovery, I think you'd need data replication. Do you want the same index data to exist in both clusters?

Cross cluster search seems most useful if you have different index data in your two data centers or clusters, as then you can do a search across all the unique data. But maybe you're just looking for high availability?

Can you explain a little more about the above? Then I can give you an informed answer.

Thnx for your reply.
The cluster is still in designphase, so maybe that's the easy part :wink:

I don't think the replication is needed. between the clusters. I think it is enough when each location records it's own data. You see, we use ES to monitor a cluster of servers which act as some kind of requestgateway to our platform. At this moment I have 8 of those servers on each location and this will be doubled during the course of this year. These 32 servers will act as a gateway for up to 3 mln clients in the busiest hours.

You can imagine this number of clients generate way to much logging for engineers to check logs per server to pinpoint issues. Hence the use of ES/kibana. By the way, yes, the locations are 2 different datacenters in 2 different regions. I recall I read on different locations in the web that spanning a cluster (or replicating nodes) over wan is not recommended, and the more data is involved the less recommended it is?

Kibana is brilliant in it's simplest form by just being able to quickly search through these loglines in a certain timeperiod. Again, brilliant for troubleshooting.

Back to the usecase. When 1 location is somehow totally down (for example poweroutage, servers will keep running, but communication may be interrupted) all customertraffic will pass through the other location. In that case that other location still needs to be monitored. When both locations are active, I want the Kibana interface to run my search query over the logs of that total of 32 machines. When 1 location is down, it's just missing that 1 location. A problem for our services, but should not be a problem for kibana then :wink:

So the data may be seperate, but for the view it shuld look like 1 index. I read something about indexaliases as well, but did not have the time to drill deaper on that yet.

Not need for comined data like in joins, but when both locations are up I'ld like to see both locations combined in the listing on the discovery page and in our graphs.

As said, I will need to have 2 logstashes and 2 kibana's as well, becuase these also need to stay functioning when the other location s down.
When both locations are up, both kibana's should give the same results. When 1 of the sides is down, the kibana of the other side should just show the logging of it's own side.

Hope this clears things a bit. This is the idea I have now. May change during the way, but I have to look at performance as well, so replication across locations is not what I have in mind at the first place.

Regard

Thanks Eric!

Cross cluster search is what you want. It sounds like you're looking for a federated search solution. We used to provide something called a Tribe node for this, but that has been deprecated (and totally removed now from our code base) in favor of CCS.

Your idea of a LB in front of both clusters is kind of the idea we had with the Tribe node, so I'm just linking you to an article comparing both Tribe and CCS solutions, so you can see a concrete example and the reasoning behind this.

Here's the full doc for configuring your clusters for CCS, which is what you want.

The Kibana side of this is quite simple--each of your Kibana instances can run cross cluster requests. Take a look here. E.g., *:logstash-* is an index pattern that searches for all logstash indices across all Elasticsearch clusters that have been configured for cross cluster search. cluster-*:* searches across all clusters prefixed with cluster-. And so forth...

The ingest side you described makes sense -- a logstash instance feeding data to each cluster.

Yup, this is possible with the setup I described.

You'll still be running searches across all clusters, and even if a cluster is down, it simply runs on whatever is available.

You can continue to have two Kibana and Logstash instances.

Does that help?

Yes, thanx. I will try to setup a clusterized environment to test this.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.